| | |
- builtins.object
-
- pandas.core.flags.Flags
- pandas.core.groupby.grouper.Grouper
- pandas.io.excel._base.ExcelFile
- pandas.io.excel._base.ExcelWriter
- pandas.io.pytables.HDFStore
- builtins.tuple(builtins.object)
-
- pandas.core.groupby.generic.NamedAgg
- contextlib.ContextDecorator(builtins.object)
-
- pandas._config.config.option_context
- pandas._libs.interval.IntervalMixin(builtins.object)
-
- pandas._libs.interval.Interval
- pandas._libs.tslibs.dtypes.PeriodDtypeBase(builtins.object)
-
- pandas.core.dtypes.dtypes.PeriodDtype(pandas._libs.tslibs.dtypes.PeriodDtypeBase, pandas.core.dtypes.dtypes.PandasExtensionDtype)
- pandas._libs.tslibs.offsets.RelativeDeltaOffset(pandas._libs.tslibs.offsets.BaseOffset)
-
- pandas._libs.tslibs.offsets.DateOffset
- pandas._libs.tslibs.period._Period(pandas._libs.tslibs.period.PeriodMixin)
-
- pandas._libs.tslibs.period.Period
- pandas._libs.tslibs.timedeltas._Timedelta(datetime.timedelta)
-
- pandas._libs.tslibs.timedeltas.Timedelta
- pandas._libs.tslibs.timestamps._Timestamp(pandas._libs.tslibs.base.ABCTimestamp)
-
- pandas._libs.tslibs.timestamps.Timestamp
- pandas.core.arraylike.OpsMixin(builtins.object)
-
- pandas.core.frame.DataFrame(pandas.core.generic.NDFrame, pandas.core.arraylike.OpsMixin)
- pandas.core.arrays._mixins.NDArrayBackedExtensionArray(pandas._libs.arrays.NDArrayBacked, pandas.core.arrays.base.ExtensionArray)
-
- pandas.core.arrays.categorical.Categorical(pandas.core.arrays._mixins.NDArrayBackedExtensionArray, pandas.core.base.PandasObject, pandas.core.strings.object_array.ObjectStringArrayMixin)
- pandas.core.arrays.floating.FloatingDtype(pandas.core.arrays.numeric.NumericDtype)
-
- pandas.core.arrays.floating.Float32Dtype
- pandas.core.arrays.floating.Float64Dtype
- pandas.core.arrays.integer._IntegerDtype(pandas.core.arrays.numeric.NumericDtype)
-
- pandas.core.arrays.integer.Int16Dtype
- pandas.core.arrays.integer.Int32Dtype
- pandas.core.arrays.integer.Int64Dtype
- pandas.core.arrays.integer.Int8Dtype
- pandas.core.arrays.integer.UInt16Dtype
- pandas.core.arrays.integer.UInt32Dtype
- pandas.core.arrays.integer.UInt64Dtype
- pandas.core.arrays.integer.UInt8Dtype
- pandas.core.arrays.masked.BaseMaskedDtype(pandas.core.dtypes.base.ExtensionDtype)
-
- pandas.core.arrays.boolean.BooleanDtype
- pandas.core.base.IndexOpsMixin(pandas.core.arraylike.OpsMixin)
-
- pandas.core.indexes.base.Index(pandas.core.base.IndexOpsMixin, pandas.core.base.PandasObject)
-
- pandas.core.indexes.multi.MultiIndex
- pandas.core.series.Series(pandas.core.base.IndexOpsMixin, pandas.core.generic.NDFrame)
- pandas.core.base.PandasObject(pandas.core.accessor.DirNamesMixin)
-
- pandas.core.arrays.categorical.Categorical(pandas.core.arrays._mixins.NDArrayBackedExtensionArray, pandas.core.base.PandasObject, pandas.core.strings.object_array.ObjectStringArrayMixin)
- pandas.core.indexes.base.Index(pandas.core.base.IndexOpsMixin, pandas.core.base.PandasObject)
-
- pandas.core.indexes.multi.MultiIndex
- pandas.core.dtypes.base.ExtensionDtype(builtins.object)
-
- pandas.core.arrays.sparse.dtype.SparseDtype
- pandas.core.arrays.string_.StringDtype
- pandas.core.dtypes.dtypes.CategoricalDtype(pandas.core.dtypes.dtypes.PandasExtensionDtype, pandas.core.dtypes.base.ExtensionDtype)
- pandas.core.dtypes.dtypes.PandasExtensionDtype(pandas.core.dtypes.base.ExtensionDtype)
-
- pandas.core.dtypes.dtypes.CategoricalDtype(pandas.core.dtypes.dtypes.PandasExtensionDtype, pandas.core.dtypes.base.ExtensionDtype)
- pandas.core.dtypes.dtypes.DatetimeTZDtype
- pandas.core.dtypes.dtypes.IntervalDtype
- pandas.core.dtypes.dtypes.PeriodDtype(pandas._libs.tslibs.dtypes.PeriodDtypeBase, pandas.core.dtypes.dtypes.PandasExtensionDtype)
- pandas.core.generic.NDFrame(pandas.core.base.PandasObject, pandas.core.indexing.IndexingMixin)
-
- pandas.core.frame.DataFrame(pandas.core.generic.NDFrame, pandas.core.arraylike.OpsMixin)
- pandas.core.series.Series(pandas.core.base.IndexOpsMixin, pandas.core.generic.NDFrame)
- pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin(pandas.core.indexes.extension.NDArrayBackedExtensionIndex)
-
- pandas.core.indexes.period.PeriodIndex
- pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin(pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin)
-
- pandas.core.indexes.datetimes.DatetimeIndex
- pandas.core.indexes.timedeltas.TimedeltaIndex
- pandas.core.indexes.extension.ExtensionIndex(pandas.core.indexes.base.Index)
-
- pandas.core.indexes.interval.IntervalIndex
- pandas.core.indexes.extension.NDArrayBackedExtensionIndex(pandas.core.indexes.extension.ExtensionIndex)
-
- pandas.core.indexes.category.CategoricalIndex
- pandas.core.indexes.numeric.NumericIndex(pandas.core.indexes.base.Index)
-
- pandas.core.indexes.range.RangeIndex
- pandas.core.strings.object_array.ObjectStringArrayMixin(pandas.core.strings.base.BaseStringArrayMethods)
-
- pandas.core.arrays.categorical.Categorical(pandas.core.arrays._mixins.NDArrayBackedExtensionArray, pandas.core.base.PandasObject, pandas.core.strings.object_array.ObjectStringArrayMixin)
class BooleanDtype(pandas.core.arrays.masked.BaseMaskedDtype) |
| |
Extension dtype for boolean data.
.. versionadded:: 1.0.0
.. warning::
BooleanDtype is considered experimental. The implementation and
parts of the API may change without warning.
Attributes
----------
None
Methods
-------
None
Examples
--------
>>> pd.BooleanDtype()
BooleanDtype |
| |
- Method resolution order:
- BooleanDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Methods defined here:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BooleanArray'
- Construct BooleanArray from pyarrow Array/ChunkedArray.
- __repr__(self) -> 'str'
- Return repr(self).
Class methods defined here:
- construct_array_type() -> 'type_t[BooleanArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Readonly properties defined here:
- kind
- A character code (one of 'biufcmMOSUV'), default 'O'
This should match the NumPy dtype used when the array is
converted to an ndarray, which is probably 'O' for object if
the extension type cannot be represented as a built-in NumPy
type.
See Also
--------
numpy.dtype.kind
- numpy_dtype
- Return an instance of our numpy dtype
- type
- The scalar type for the array, e.g. ``int``
It's expected ``ExtensionArray[item]`` returns an instance
of ``ExtensionDtype.type`` for scalar ``item``, assuming
that value is valid (not NA). NA values do not need to be
instances of `type`.
Data and other attributes defined here:
- name = 'boolean'
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Categorical(pandas.core.arrays._mixins.NDArrayBackedExtensionArray, pandas.core.base.PandasObject, pandas.core.strings.object_array.ObjectStringArrayMixin) |
| |
Categorical(values, categories=None, ordered=None, dtype: 'Dtype | None' = None, fastpath: 'bool' = False, copy: 'bool' = True)
Represent a categorical variable in classic R / S-plus fashion.
`Categoricals` can only take on only a limited, and usually fixed, number
of possible values (`categories`). In contrast to statistical categorical
variables, a `Categorical` might have an order, but numerical operations
(additions, divisions, ...) are not possible.
All values of the `Categorical` are either in `categories` or `np.nan`.
Assigning values outside of `categories` will raise a `ValueError`. Order
is defined by the order of the `categories`, not lexical order of the
values.
Parameters
----------
values : list-like
The values of the categorical. If categories are given, values not in
categories will be replaced with NaN.
categories : Index-like (unique), optional
The unique categories for this categorical. If not given, the
categories are assumed to be the unique values of `values` (sorted, if
possible, otherwise in the order in which they appear).
ordered : bool, default False
Whether or not this categorical is treated as a ordered categorical.
If True, the resulting categorical will be ordered.
An ordered categorical respects, when sorted, the order of its
`categories` attribute (which in turn is the `categories` argument, if
provided).
dtype : CategoricalDtype
An instance of ``CategoricalDtype`` to use for this categorical.
Attributes
----------
categories : Index
The categories of this categorical
codes : ndarray
The codes (integer positions, which point to the categories) of this
categorical, read only.
ordered : bool
Whether or not this Categorical is ordered.
dtype : CategoricalDtype
The instance of ``CategoricalDtype`` storing the ``categories``
and ``ordered``.
Methods
-------
from_codes
__array__
Raises
------
ValueError
If the categories do not validate.
TypeError
If an explicit ``ordered=True`` is given but no `categories` and the
`values` are not sortable.
See Also
--------
CategoricalDtype : Type for categorical data.
CategoricalIndex : An Index with an underlying ``Categorical``.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html>`__
for more.
Examples
--------
>>> pd.Categorical([1, 2, 3, 1, 2, 3])
[1, 2, 3, 1, 2, 3]
Categories (3, int64): [1, 2, 3]
>>> pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
['a', 'b', 'c', 'a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
Missing values are not included as a category.
>>> c = pd.Categorical([1, 2, 3, 1, 2, 3, np.nan])
>>> c
[1, 2, 3, 1, 2, 3, NaN]
Categories (3, int64): [1, 2, 3]
However, their presence is indicated in the `codes` attribute
by code `-1`.
>>> c.codes
array([ 0, 1, 2, 0, 1, 2, -1], dtype=int8)
Ordered `Categoricals` can be sorted according to the custom order
of the categories and can have a min and max value.
>>> c = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'], ordered=True,
... categories=['c', 'b', 'a'])
>>> c
['a', 'b', 'c', 'a', 'b', 'c']
Categories (3, object): ['c' < 'b' < 'a']
>>> c.min()
'c' |
| |
- Method resolution order:
- Categorical
- pandas.core.arrays._mixins.NDArrayBackedExtensionArray
- pandas._libs.arrays.NDArrayBacked
- pandas.core.arrays.base.ExtensionArray
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- pandas.core.strings.object_array.ObjectStringArrayMixin
- pandas.core.strings.base.BaseStringArrayMethods
- abc.ABC
- builtins.object
Methods defined here:
- __array__(self, dtype: 'NpDtype | None' = None) -> 'np.ndarray'
- The numpy array interface.
Returns
-------
numpy.array
A numpy array of either the specified dtype or,
if dtype==None (default), the same dtype as
categorical.categories.dtype.
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str', *inputs, **kwargs)
- __contains__(self, key) -> 'bool'
- Returns True if `key` is in this Categorical.
- __eq__(self, other)
- __ge__(self, other)
- __gt__(self, other)
- __init__(self, values, categories=None, ordered=None, dtype: 'Dtype | None' = None, fastpath: 'bool' = False, copy: 'bool' = True)
- Initialize self. See help(type(self)) for accurate signature.
- __iter__(self)
- Returns an Iterator over the values of this Categorical.
- __le__(self, other)
- __lt__(self, other)
- __ne__(self, other)
- __repr__(self) -> 'str'
- String representation.
- __setstate__(self, state)
- Necessary for making this object picklable
- add_categories(self, new_categories, inplace=<no_default>)
- Add new categories.
`new_categories` will be included at the last/highest place in the
categories and will be unused directly after this call.
Parameters
----------
new_categories : category or list-like of category
The new categories to be included.
inplace : bool, default False
Whether or not to add the categories inplace or return a copy of
this categorical with added categories.
.. deprecated:: 1.3.0
Returns
-------
cat : Categorical or None
Categorical with new categories added or None if ``inplace=True``.
Raises
------
ValueError
If the new categories include old categories or do not validate as
categories
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
Examples
--------
>>> c = pd.Categorical(['c', 'b', 'c'])
>>> c
['c', 'b', 'c']
Categories (2, object): ['b', 'c']
>>> c.add_categories(['d', 'a'])
['c', 'b', 'c']
Categories (4, object): ['b', 'c', 'd', 'a']
- argsort(self, ascending=True, kind='quicksort', **kwargs)
- Return the indices that would sort the Categorical.
.. versionchanged:: 0.25.0
Changed to sort missing values at the end.
Parameters
----------
ascending : bool, default True
Whether the indices should result in an ascending
or descending sort.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
Sorting algorithm.
**kwargs:
passed through to :func:`numpy.argsort`.
Returns
-------
np.ndarray[np.intp]
See Also
--------
numpy.ndarray.argsort
Notes
-----
While an ordering is applied to the category values, arg-sorting
in this context refers more to organizing and grouping together
based on matching category values. Thus, this function can be
called on an unordered Categorical instance unlike the functions
'Categorical.min' and 'Categorical.max'.
Examples
--------
>>> pd.Categorical(['b', 'b', 'a', 'c']).argsort()
array([2, 0, 1, 3])
>>> cat = pd.Categorical(['b', 'b', 'a', 'c'],
... categories=['c', 'b', 'a'],
... ordered=True)
>>> cat.argsort()
array([3, 0, 1, 2])
Missing values are placed at the end
>>> cat = pd.Categorical([2, None, 1])
>>> cat.argsort()
array([2, 0, 1])
- as_ordered(self, inplace=False)
- Set the Categorical to be ordered.
Parameters
----------
inplace : bool, default False
Whether or not to set the ordered attribute in-place or return
a copy of this categorical with ordered set to True.
Returns
-------
Categorical or None
Ordered Categorical or None if ``inplace=True``.
- as_unordered(self, inplace=False)
- Set the Categorical to be unordered.
Parameters
----------
inplace : bool, default False
Whether or not to set the ordered attribute in-place or return
a copy of this categorical with ordered set to False.
Returns
-------
Categorical or None
Unordered Categorical or None if ``inplace=True``.
- astype(self, dtype: 'AstypeArg', copy: 'bool' = True) -> 'ArrayLike'
- Coerce this type to another dtype
Parameters
----------
dtype : numpy dtype or pandas type
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and dtype is categorical, the original
object is returned.
- check_for_ordered(self, op)
- assert that we are ordered
- describe(self)
- Describes this Categorical
Returns
-------
description: `DataFrame`
A dataframe with frequency and counts by category.
- equals(self, other: 'object') -> 'bool'
- Returns True if categorical arrays are equal.
Parameters
----------
other : `Categorical`
Returns
-------
bool
- is_dtype_equal(self, other) -> 'bool'
- isin(self, values) -> 'npt.NDArray[np.bool_]'
- Check whether `values` are contained in Categorical.
Return a boolean NumPy Array showing whether each element in
the Categorical matches an element in the passed sequence of
`values` exactly.
Parameters
----------
values : set or list-like
The sequence of values to test. Passing in a single string will
raise a ``TypeError``. Instead, turn a single string into a
list of one element.
Returns
-------
np.ndarray[bool]
Raises
------
TypeError
* If `values` is not a set or list-like
See Also
--------
pandas.Series.isin : Equivalent method on Series.
Examples
--------
>>> s = pd.Categorical(['lama', 'cow', 'lama', 'beetle', 'lama',
... 'hippo'])
>>> s.isin(['cow', 'lama'])
array([ True, True, True, False, True, False])
Passing a single string as ``s.isin('lama')`` will raise an error. Use
a list of one element instead:
>>> s.isin(['lama'])
array([ True, False, True, False, True, False])
- isna(self) -> 'np.ndarray'
- Detect missing values
Missing values (-1 in .codes) are detected.
Returns
-------
np.ndarray[bool] of whether my values are null
See Also
--------
isna : Top-level isna.
isnull : Alias of isna.
Categorical.notna : Boolean inverse of Categorical.isna.
- isnull = isna(self) -> 'np.ndarray'
- map(self, mapper)
- Map categories using an input mapping or function.
Maps the categories to new categories. If the mapping correspondence is
one-to-one the result is a :class:`~pandas.Categorical` which has the
same order property as the original, otherwise a :class:`~pandas.Index`
is returned. NaN values are unaffected.
If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to `NaN`. Note that if this happens an :class:`~pandas.Index`
will be returned.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
Returns
-------
pandas.Categorical or pandas.Index
Mapped categorical.
See Also
--------
CategoricalIndex.map : Apply a mapping correspondence on a
:class:`~pandas.CategoricalIndex`.
Index.map : Apply a mapping correspondence on an
:class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
:class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
:class:`~pandas.Series`.
Examples
--------
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> cat.map(lambda x: x.upper())
['A', 'B', 'C']
Categories (3, object): ['A', 'B', 'C']
>>> cat.map({'a': 'first', 'b': 'second', 'c': 'third'})
['first', 'second', 'third']
Categories (3, object): ['first', 'second', 'third']
If the mapping is one-to-one the ordering of the categories is
preserved:
>>> cat = pd.Categorical(['a', 'b', 'c'], ordered=True)
>>> cat
['a', 'b', 'c']
Categories (3, object): ['a' < 'b' < 'c']
>>> cat.map({'a': 3, 'b': 2, 'c': 1})
[3, 2, 1]
Categories (3, int64): [3 < 2 < 1]
If the mapping is not one-to-one an :class:`~pandas.Index` is returned:
>>> cat.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')
If a `dict` is used, all unmapped categories are mapped to `NaN` and
the result is an :class:`~pandas.Index`:
>>> cat.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')
- max(self, *, skipna=True, **kwargs)
- The maximum value of the object.
Only ordered `Categoricals` have a maximum!
.. versionchanged:: 1.0.0
Returns an NA value on empty arrays
Raises
------
TypeError
If the `Categorical` is not `ordered`.
Returns
-------
max : the maximum of this `Categorical`
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of my values
Parameters
----------
deep : bool
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption
Returns
-------
bytes used
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False
See Also
--------
numpy.ndarray.nbytes
- min(self, *, skipna=True, **kwargs)
- The minimum value of the object.
Only ordered `Categoricals` have a minimum!
.. versionchanged:: 1.0.0
Returns an NA value on empty arrays
Raises
------
TypeError
If the `Categorical` is not `ordered`.
Returns
-------
min : the minimum of this `Categorical`
- mode(self, dropna: 'bool' = True) -> 'Categorical'
- Returns the mode(s) of the Categorical.
Always returns `Categorical` even if only one value.
Parameters
----------
dropna : bool, default True
Don't consider counts of NaN/NaT.
Returns
-------
modes : `Categorical` (sorted)
- notna(self) -> 'np.ndarray'
- Inverse of isna
Both missing values (-1 in .codes) and NA as a category are detected as
null.
Returns
-------
np.ndarray[bool] of whether my values are not null
See Also
--------
notna : Top-level notna.
notnull : Alias of notna.
Categorical.isna : Boolean inverse of Categorical.notna.
- notnull = notna(self) -> 'np.ndarray'
- remove_categories(self, removals, inplace=<no_default>)
- Remove the specified categories.
`removals` must be included in the old categories. Values which were in
the removed categories will be set to NaN
Parameters
----------
removals : category or list of categories
The categories which should be removed.
inplace : bool, default False
Whether or not to remove the categories inplace or return a copy of
this categorical with removed categories.
.. deprecated:: 1.3.0
Returns
-------
cat : Categorical or None
Categorical with removed categories or None if ``inplace=True``.
Raises
------
ValueError
If the removals are not contained in the categories
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
Examples
--------
>>> c = pd.Categorical(['a', 'c', 'b', 'c', 'd'])
>>> c
['a', 'c', 'b', 'c', 'd']
Categories (4, object): ['a', 'b', 'c', 'd']
>>> c.remove_categories(['d', 'a'])
[NaN, 'c', 'b', 'c', NaN]
Categories (2, object): ['b', 'c']
- remove_unused_categories(self, inplace=<no_default>)
- Remove categories which are not used.
Parameters
----------
inplace : bool, default False
Whether or not to drop unused categories inplace or return a copy of
this categorical with unused categories dropped.
.. deprecated:: 1.2.0
Returns
-------
cat : Categorical or None
Categorical with unused categories dropped or None if ``inplace=True``.
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
set_categories : Set the categories to the specified ones.
Examples
--------
>>> c = pd.Categorical(['a', 'c', 'b', 'c', 'd'])
>>> c
['a', 'c', 'b', 'c', 'd']
Categories (4, object): ['a', 'b', 'c', 'd']
>>> c[2] = 'a'
>>> c[4] = 'c'
>>> c
['a', 'c', 'a', 'c', 'c']
Categories (4, object): ['a', 'b', 'c', 'd']
>>> c.remove_unused_categories()
['a', 'c', 'a', 'c', 'c']
Categories (2, object): ['a', 'c']
- rename_categories(self, new_categories, inplace=<no_default>)
- Rename categories.
Parameters
----------
new_categories : list-like, dict-like or callable
New categories which will replace old categories.
* list-like: all items must be unique and the number of items in
the new categories must match the existing number of categories.
* dict-like: specifies a mapping from
old categories to new. Categories not contained in the mapping
are passed through and extra categories in the mapping are
ignored.
* callable : a callable that is called on all items in the old
categories and whose return values comprise the new categories.
inplace : bool, default False
Whether or not to rename the categories inplace or return a copy of
this categorical with renamed categories.
.. deprecated:: 1.3.0
Returns
-------
cat : Categorical or None
Categorical with removed categories or None if ``inplace=True``.
Raises
------
ValueError
If new categories are list-like and do not have the same number of
items than the current categories or do not validate as categories
See Also
--------
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
Examples
--------
>>> c = pd.Categorical(['a', 'a', 'b'])
>>> c.rename_categories([0, 1])
[0, 0, 1]
Categories (2, int64): [0, 1]
For dict-like ``new_categories``, extra keys are ignored and
categories not in the dictionary are passed through
>>> c.rename_categories({'a': 'A', 'c': 'C'})
['A', 'A', 'b']
Categories (2, object): ['A', 'b']
You may also provide a callable to create the new categories
>>> c.rename_categories(lambda x: x.upper())
['A', 'A', 'B']
Categories (2, object): ['A', 'B']
- reorder_categories(self, new_categories, ordered=None, inplace=<no_default>)
- Reorder categories as specified in new_categories.
`new_categories` need to include all old categories and no new category
items.
Parameters
----------
new_categories : Index-like
The categories in new order.
ordered : bool, optional
Whether or not the categorical is treated as a ordered categorical.
If not given, do not change the ordered information.
inplace : bool, default False
Whether or not to reorder the categories inplace or return a copy of
this categorical with reordered categories.
.. deprecated:: 1.3.0
Returns
-------
cat : Categorical or None
Categorical with removed categories or None if ``inplace=True``.
Raises
------
ValueError
If the new categories do not contain all old category items or any
new ones
See Also
--------
rename_categories : Rename categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
- replace(self, to_replace, value, inplace: 'bool' = False)
- Replaces all instances of one value with another
Parameters
----------
to_replace: object
The value to be replaced
value: object
The value to replace it with
inplace: bool
Whether the operation is done in-place
Returns
-------
None if inplace is True, otherwise the new Categorical after replacement
Examples
--------
>>> s = pd.Categorical([1, 2, 1, 3])
>>> s.replace(1, 3)
[3, 2, 3, 3]
Categories (2, int64): [2, 3]
- set_categories(self, new_categories, ordered=None, rename=False, inplace=<no_default>)
- Set the categories to the specified new_categories.
`new_categories` can include new categories (which will result in
unused categories) or remove old categories (which results in values
set to NaN). If `rename==True`, the categories will simple be renamed
(less or more items than in old categories will result in values set to
NaN or in unused categories respectively).
This method can be used to perform more than one action of adding,
removing, and reordering simultaneously and is therefore faster than
performing the individual steps via the more specialised methods.
On the other hand this methods does not do checks (e.g., whether the
old categories are included in the new categories on a reorder), which
can result in surprising changes, for example when using special string
dtypes, which does not considers a S1 string equal to a single char
python string.
Parameters
----------
new_categories : Index-like
The categories in new order.
ordered : bool, default False
Whether or not the categorical is treated as a ordered categorical.
If not given, do not change the ordered information.
rename : bool, default False
Whether or not the new_categories should be considered as a rename
of the old categories or as reordered categories.
inplace : bool, default False
Whether or not to reorder the categories in-place or return a copy
of this categorical with reordered categories.
.. deprecated:: 1.3.0
Returns
-------
Categorical with reordered categories or None if inplace.
Raises
------
ValueError
If new_categories does not validate as categories
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
- set_ordered(self, value, inplace=False)
- Set the ordered attribute to the boolean value.
Parameters
----------
value : bool
Set whether this categorical is ordered (True) or not (False).
inplace : bool, default False
Whether or not to set the ordered attribute in-place or return
a copy of this categorical with ordered set to the value.
- sort_values(self, inplace: 'bool' = False, ascending: 'bool' = True, na_position: 'str' = 'last')
- Sort the Categorical by category value returning a new
Categorical by default.
While an ordering is applied to the category values, sorting in this
context refers more to organizing and grouping together based on
matching category values. Thus, this function can be called on an
unordered Categorical instance unlike the functions 'Categorical.min'
and 'Categorical.max'.
Parameters
----------
inplace : bool, default False
Do operation in place.
ascending : bool, default True
Order ascending. Passing False orders descending. The
ordering parameter provides the method by which the
category values are organized.
na_position : {'first', 'last'} (optional, default='last')
'first' puts NaNs at the beginning
'last' puts NaNs at the end
Returns
-------
Categorical or None
See Also
--------
Categorical.sort
Series.sort_values
Examples
--------
>>> c = pd.Categorical([1, 2, 2, 1, 5])
>>> c
[1, 2, 2, 1, 5]
Categories (3, int64): [1, 2, 5]
>>> c.sort_values()
[1, 1, 2, 2, 5]
Categories (3, int64): [1, 2, 5]
>>> c.sort_values(ascending=False)
[5, 2, 2, 1, 1]
Categories (3, int64): [1, 2, 5]
Inplace sorting can be done as well:
>>> c.sort_values(inplace=True)
>>> c
[1, 1, 2, 2, 5]
Categories (3, int64): [1, 2, 5]
>>>
>>> c = pd.Categorical([1, 2, 2, 1, 5])
'sort_values' behaviour with NaNs. Note that 'na_position'
is independent of the 'ascending' parameter:
>>> c = pd.Categorical([np.nan, 2, 2, np.nan, 5])
>>> c
[NaN, 2, 2, NaN, 5]
Categories (2, int64): [2, 5]
>>> c.sort_values()
[2, 2, 5, NaN, NaN]
Categories (2, int64): [2, 5]
>>> c.sort_values(ascending=False)
[5, 2, 2, NaN, NaN]
Categories (2, int64): [2, 5]
>>> c.sort_values(na_position='first')
[NaN, NaN, 2, 2, 5]
Categories (2, int64): [2, 5]
>>> c.sort_values(ascending=False, na_position='first')
[NaN, NaN, 5, 2, 2]
Categories (2, int64): [2, 5]
- take_nd(self, indexer, allow_fill: 'bool' = False, fill_value=None)
- to_dense(self) -> 'np.ndarray'
- Return my 'dense' representation
For internal compatibility with numpy arrays.
Returns
-------
dense : array
- to_list(self)
- Alias for tolist.
- unique(self)
- Return the ``Categorical`` which ``categories`` and ``codes`` are
unique.
.. versionchanged:: 1.3.0
Previously, unused categories were dropped from the new categories.
Returns
-------
Categorical
See Also
--------
pandas.unique
CategoricalIndex.unique
Series.unique : Return unique values of Series object.
Examples
--------
>>> pd.Categorical(list("baabc")).unique()
['b', 'a', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> pd.Categorical(list("baab"), categories=list("abc"), ordered=True).unique()
['b', 'a']
Categories (3, object): ['a' < 'b' < 'c']
- value_counts(self, dropna: 'bool' = True)
- Return a Series containing counts of each category.
Every category will have an entry, even those with a count of 0.
Parameters
----------
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
counts : Series
See Also
--------
Series.value_counts
- view(self, dtype=None)
- Return a view on the array.
Parameters
----------
dtype : str, np.dtype, or ExtensionDtype, optional
Default None.
Returns
-------
ExtensionArray or np.ndarray
A view on the :class:`ExtensionArray`'s data.
Class methods defined here:
- from_codes(codes, categories=None, ordered=None, dtype: 'Dtype | None' = None) from abc.ABCMeta
- Make a Categorical type from codes and categories or dtype.
This constructor is useful if you already have codes and
categories/dtype and so do not need the (computation intensive)
factorization step, which is usually done on the constructor.
If your data does not follow this convention, please use the normal
constructor.
Parameters
----------
codes : array-like of int
An integer array, where each integer points to a category in
categories or dtype.categories, or else is -1 for NaN.
categories : index-like, optional
The categories for the categorical. Items need to be unique.
If the categories are not given here, then they must be provided
in `dtype`.
ordered : bool, optional
Whether or not this categorical is treated as an ordered
categorical. If not given here or in `dtype`, the resulting
categorical will be unordered.
dtype : CategoricalDtype or "category", optional
If :class:`CategoricalDtype`, cannot be used together with
`categories` or `ordered`.
Returns
-------
Categorical
Examples
--------
>>> dtype = pd.CategoricalDtype(['a', 'b'], ordered=True)
>>> pd.Categorical.from_codes(codes=[0, 1, 0, 1], dtype=dtype)
['a', 'b', 'a', 'b']
Categories (2, object): ['a' < 'b']
Readonly properties defined here:
- codes
- The category codes of this categorical.
Codes are an array of integers which are the positions of the actual
values in the categories array.
There is no setter, use the other categorical methods and the normal item
setter to change values in the categorical.
Returns
-------
ndarray[int]
A non-writable view of the `codes` array.
- dtype
- The :class:`~pandas.api.types.CategoricalDtype` for this instance.
- nbytes
- The number of bytes needed to store this object in memory.
- ordered
- Whether the categories have an ordered relationship.
Data descriptors defined here:
- categories
- The categories of this categorical.
Setting assigns new values to each category (effectively a rename of
each individual category).
The assigned value has to be a list-like object. All items must be
unique and the number of items in the new categories must be the same
as the number of items in the old categories.
Assigning to `categories` is a inplace operation!
Raises
------
ValueError
If the new categories do not validate as categories or if the
number of new categories is unequal the number of old categories
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
- itemsize
- return the size of a single category
Data and other attributes defined here:
- __abstractmethods__ = frozenset()
- __annotations__ = {'_dtype': 'CategoricalDtype'}
- __array_priority__ = 1000
- __hash__ = None
Methods inherited from pandas.core.arrays._mixins.NDArrayBackedExtensionArray:
- __getitem__(self: 'NDArrayBackedExtensionArrayT', key: 'PositionalIndexer2D') -> 'NDArrayBackedExtensionArrayT | Any'
- Select a subset of self.
Parameters
----------
item : int, slice, or ndarray
* int: The position in 'self' to get.
* slice: A slice object, where 'start', 'stop', and 'step' are
integers or None
* ndarray: A 1-d boolean NumPy ndarray the same length as 'self'
* list[int]: A list of int
Returns
-------
item : scalar or ExtensionArray
Notes
-----
For scalar ``item``, return a scalar value suitable for the array's
type. This should be an instance of ``self.dtype.type``.
For slice ``key``, return an instance of ``ExtensionArray``, even
if the slice is length 0 or 1.
For a boolean mask, return an instance of ``ExtensionArray``, filtered
to the values where ``item`` is True.
- __setitem__(self, key, value)
- Set one or more values inplace.
This method is not required to satisfy the pandas extension array
interface.
Parameters
----------
key : int, ndarray, or slice
When called from, e.g. ``Series.__setitem__``, ``key`` will be
one of
* scalar int
* ndarray of integers.
* boolean ndarray
* slice object
value : ExtensionDtype.type, Sequence[ExtensionDtype.type], or object
value or values to be set of ``key``.
Returns
-------
None
- argmax(self, axis: 'int' = 0, skipna: 'bool' = True)
- Return the index of maximum value.
In case of multiple occurrences of the maximum value, the index
corresponding to the first occurrence is returned.
Parameters
----------
skipna : bool, default True
Returns
-------
int
See Also
--------
ExtensionArray.argmin
- argmin(self, axis: 'int' = 0, skipna: 'bool' = True)
- Return the index of minimum value.
In case of multiple occurrences of the minimum value, the index
corresponding to the first occurrence is returned.
Parameters
----------
skipna : bool, default True
Returns
-------
int
See Also
--------
ExtensionArray.argmax
- fillna(self: 'NDArrayBackedExtensionArrayT', value=None, method=None, limit=None) -> 'NDArrayBackedExtensionArrayT'
- Fill NA/NaN values using the specified method.
Parameters
----------
value : scalar, array-like
If a scalar value is passed it is used to fill all missing values.
Alternatively, an array-like 'value' can be given. It's expected
that the array-like have the same length as 'self'.
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap.
limit : int, default None
If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled.
Returns
-------
ExtensionArray
With NA/NaN filled.
- insert(self: 'NDArrayBackedExtensionArrayT', loc: 'int', item) -> 'NDArrayBackedExtensionArrayT'
- Make new ExtensionArray inserting new item at location. Follows
Python list.append semantics for negative values.
Parameters
----------
loc : int
item : object
Returns
-------
type(self)
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted array `self` (a) such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
Assuming that `self` is sorted:
====== ================================
`side` returned index `i` satisfies
====== ================================
left ``self[i-1] < value <= self[i]``
right ``self[i-1] <= value < self[i]``
====== ================================
Parameters
----------
value : array-like, list or scalar
Value(s) to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort array a into ascending
order. They are typically the result of argsort.
Returns
-------
array of ints or int
If value is array-like, array of insertion points.
If value is scalar, a single integer.
See Also
--------
numpy.searchsorted : Similar method from NumPy.
- shift(self, periods=1, fill_value=None, axis=0)
- Shift values by desired number.
Newly introduced missing values are filled with
``self.dtype.na_value``.
Parameters
----------
periods : int, default 1
The number of periods to shift. Negative values are allowed
for shifting backwards.
fill_value : object, optional
The scalar value to use for newly introduced missing values.
The default is ``self.dtype.na_value``.
Returns
-------
ExtensionArray
Shifted.
Notes
-----
If ``self`` is empty or ``periods`` is 0, a copy of ``self`` is
returned.
If ``periods > len(self)``, then an array of size
len(self) is returned, with all values filled with
``self.dtype.na_value``.
- take(self: 'NDArrayBackedExtensionArrayT', indices: 'TakeIndexer', *, allow_fill: 'bool' = False, fill_value: 'Any' = None, axis: 'int' = 0) -> 'NDArrayBackedExtensionArrayT'
- Take elements from an array.
Parameters
----------
indices : sequence of int or one-dimensional np.ndarray of int
Indices to be taken.
allow_fill : bool, default False
How to handle negative values in `indices`.
* False: negative values in `indices` indicate positional indices
from the right (the default). This is similar to
:func:`numpy.take`.
* True: negative values in `indices` indicate
missing values. These values are set to `fill_value`. Any other
other negative values raise a ``ValueError``.
fill_value : any, optional
Fill value to use for NA-indices when `allow_fill` is True.
This may be ``None``, in which case the default NA value for
the type, ``self.dtype.na_value``, is used.
For many ExtensionArrays, there will be two representations of
`fill_value`: a user-facing "boxed" scalar, and a low-level
physical NA value. `fill_value` should be the user-facing version,
and the implementation should handle translating that to the
physical version for processing the take if necessary.
Returns
-------
ExtensionArray
Raises
------
IndexError
When the indices are out of bounds for the array.
ValueError
When `indices` contains negative values other than ``-1``
and `allow_fill` is True.
See Also
--------
numpy.take : Take elements from an array along an axis.
api.extensions.take : Take elements from an array.
Notes
-----
ExtensionArray.take is called by ``Series.__getitem__``, ``.loc``,
``iloc``, when `indices` is a sequence of values. Additionally,
it's called by :meth:`Series.reindex`, or any other method
that causes realignment, with a `fill_value`.
Examples
--------
Here's an example implementation, which relies on casting the
extension array to object dtype. This uses the helper method
:func:`pandas.api.extensions.take`.
.. code-block:: python
def take(self, indices, allow_fill=False, fill_value=None):
from pandas.core.algorithms import take
# If the ExtensionArray is backed by an ndarray, then
# just pass that here instead of coercing to object.
data = self.astype(object)
if allow_fill and fill_value is None:
fill_value = self.dtype.na_value
# fill value should always be translated from the scalar
# type for the array, to the physical storage type for
# the data, before passing to take.
result = take(data, indices, fill_value=fill_value,
allow_fill=allow_fill)
return self._from_sequence(result, dtype=self.dtype)
Data descriptors inherited from pandas.core.arrays._mixins.NDArrayBackedExtensionArray:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Methods inherited from pandas._libs.arrays.NDArrayBacked:
- __len__(self, /)
- Return len(self).
- __reduce__ = __reduce_cython__(...)
- __setstate_cython__(...)
- copy(...)
- delete(...)
- ravel(...)
- repeat(...)
- reshape(...)
- swapaxes(...)
- transpose(...)
Static methods inherited from pandas._libs.arrays.NDArrayBacked:
- __new__(*args, **kwargs) from builtins.type
- Create and return a new object. See help(type) for accurate signature.
Data descriptors inherited from pandas._libs.arrays.NDArrayBacked:
- T
- ndim
- shape
- size
Data and other attributes inherited from pandas._libs.arrays.NDArrayBacked:
- __pyx_vtable__ = <capsule object NULL>
Methods inherited from pandas.core.arrays.base.ExtensionArray:
- dropna(self: 'ExtensionArrayT') -> 'ExtensionArrayT'
- Return ExtensionArray without NA values.
Returns
-------
valid : ExtensionArray
- factorize(self, na_sentinel: 'int' = -1) -> 'tuple[np.ndarray, ExtensionArray]'
- Encode the extension array as an enumerated type.
Parameters
----------
na_sentinel : int, default -1
Value to use in the `codes` array to indicate missing values.
Returns
-------
codes : ndarray
An integer NumPy array that's an indexer into the original
ExtensionArray.
uniques : ExtensionArray
An ExtensionArray containing the unique values of `self`.
.. note::
uniques will *not* contain an entry for the NA value of
the ExtensionArray if there are any missing values present
in `self`.
See Also
--------
factorize : Top-level factorize method that dispatches here.
Notes
-----
:meth:`pandas.factorize` offers a `sort` keyword as well.
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>) -> 'np.ndarray'
- Convert to a NumPy ndarray.
.. versionadded:: 1.0.0
This is similar to :meth:`numpy.asarray`, but may provide additional control
over how the conversion is done.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is a not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
Returns
-------
numpy.ndarray
- tolist(self) -> 'list'
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class CategoricalDtype(PandasExtensionDtype, pandas.core.dtypes.base.ExtensionDtype) |
| |
CategoricalDtype(categories=None, ordered: 'Ordered' = False)
Type for categorical data with the categories and orderedness.
Parameters
----------
categories : sequence, optional
Must be unique, and must not contain any nulls.
The categories are stored in an Index,
and if an index is provided the dtype of that index will be used.
ordered : bool or None, default False
Whether or not this categorical is treated as a ordered categorical.
None can be used to maintain the ordered value of existing categoricals when
used in operations that combine categoricals, e.g. astype, and will resolve to
False if there is no existing ordered to maintain.
Attributes
----------
categories
ordered
Methods
-------
None
See Also
--------
Categorical : Represent a categorical variable in classic R / S-plus fashion.
Notes
-----
This class is useful for specifying the type of a ``Categorical``
independent of the values. See :ref:`categorical.categoricaldtype`
for more.
Examples
--------
>>> t = pd.CategoricalDtype(categories=['b', 'a'], ordered=True)
>>> pd.Series(['a', 'b', 'a', 'c'], dtype=t)
0 a
1 b
2 a
3 NaN
dtype: category
Categories (2, object): ['b' < 'a']
An empty CategoricalDtype with a specific dtype can be created
by providing an empty index. As follows,
>>> pd.CategoricalDtype(pd.DatetimeIndex([])).categories.dtype
dtype('<M8[ns]') |
| |
- Method resolution order:
- CategoricalDtype
- PandasExtensionDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Methods defined here:
- __eq__(self, other: 'Any') -> 'bool'
- Rules for CDT equality:
1) Any CDT is equal to the string 'category'
2) Any CDT is equal to itself
3) Any CDT is equal to a CDT with categories=None regardless of ordered
4) A CDT with ordered=True is only equal to another CDT with
ordered=True and identical categories in the same order
5) A CDT with ordered={False, None} is only equal to another CDT with
ordered={False, None} and identical categories, but same order is
not required. There is no distinction between False/None.
6) Any other comparison returns False
- __hash__(self) -> 'int'
- Return hash(self).
- __init__(self, categories=None, ordered: 'Ordered' = False)
- Initialize self. See help(type(self)) for accurate signature.
- __repr__(self) -> 'str_type'
- Return a string representation for a particular object.
- __setstate__(self, state: 'MutableMapping[str_type, Any]') -> 'None'
- update_dtype(self, dtype: 'str_type | CategoricalDtype') -> 'CategoricalDtype'
- Returns a CategoricalDtype with categories and ordered taken from dtype
if specified, otherwise falling back to self if unspecified
Parameters
----------
dtype : CategoricalDtype
Returns
-------
new_dtype : CategoricalDtype
Class methods defined here:
- construct_array_type() -> 'type_t[Categorical]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
- construct_from_string(string: 'str_type') -> 'CategoricalDtype' from builtins.type
- Construct a CategoricalDtype from a string.
Parameters
----------
string : str
Must be the string "category" in order to be successfully constructed.
Returns
-------
CategoricalDtype
Instance of the dtype.
Raises
------
TypeError
If a CategoricalDtype cannot be constructed from the input.
Static methods defined here:
- validate_categories(categories, fastpath: 'bool' = False) -> 'Index'
- Validates that we have good categories
Parameters
----------
categories : array-like
fastpath : bool
Whether to skip nan and uniqueness checks
Returns
-------
categories : Index
- validate_ordered(ordered: 'Ordered') -> 'None'
- Validates that we have a valid ordered parameter. If
it is not a boolean, a TypeError will be raised.
Parameters
----------
ordered : object
The parameter to be verified.
Raises
------
TypeError
If 'ordered' is not a boolean.
Readonly properties defined here:
- categories
- An ``Index`` containing the unique categories allowed.
- ordered
- Whether the categories have an ordered relationship.
Data and other attributes defined here:
- __annotations__ = {'_cache_dtypes': 'dict[str_type, PandasExtensionDtype]', 'kind': 'str_type', 'type': 'type[CategoricalDtypeType]'}
- base = dtype('O')
- kind = 'O'
- name = 'category'
- str = '|O08'
- type = <class 'pandas.core.dtypes.dtypes.CategoricalDtypeType'>
- the type of CategoricalDtype, this metaclass determines subclass ability
Methods inherited from PandasExtensionDtype:
- __getstate__(self) -> 'dict[str_type, Any]'
Class methods inherited from PandasExtensionDtype:
- reset_cache() -> 'None' from builtins.type
- clear the cache
Data and other attributes inherited from PandasExtensionDtype:
- isbuiltin = 0
- isnative = 0
- itemsize = 8
- num = 100
- shape = ()
- subdtype = None
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- na_value
- Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the
user-facing "boxed" version of the NA value, not the physical NA value
for storage. e.g. for JSONArray, this is an empty dictionary.
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class CategoricalIndex(pandas.core.indexes.extension.NDArrayBackedExtensionIndex) |
| |
CategoricalIndex(data=None, categories=None, ordered=None, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None) -> 'CategoricalIndex'
Index based on an underlying :class:`Categorical`.
CategoricalIndex, like Categorical, can only take on a limited,
and usually fixed, number of possible values (`categories`). Also,
like Categorical, it might have an order, but numerical operations
(additions, divisions, ...) are not possible.
Parameters
----------
data : array-like (1-dimensional)
The values of the categorical. If `categories` are given, values not in
`categories` will be replaced with NaN.
categories : index-like, optional
The categories for the categorical. Items need to be unique.
If the categories are not given here (and also not in `dtype`), they
will be inferred from the `data`.
ordered : bool, optional
Whether or not this categorical is treated as an ordered
categorical. If not given here or in `dtype`, the resulting
categorical will be unordered.
dtype : CategoricalDtype or "category", optional
If :class:`CategoricalDtype`, cannot be used together with
`categories` or `ordered`.
copy : bool, default False
Make a copy of input ndarray.
name : object, optional
Name to be stored in the index.
Attributes
----------
codes
categories
ordered
Methods
-------
rename_categories
reorder_categories
add_categories
remove_categories
remove_unused_categories
set_categories
as_ordered
as_unordered
map
Raises
------
ValueError
If the categories do not validate.
TypeError
If an explicit ``ordered=True`` is given but no `categories` and the
`values` are not sortable.
See Also
--------
Index : The base pandas Index type.
Categorical : A categorical array.
CategoricalDtype : Type for categorical data.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#categoricalindex>`__
for more.
Examples
--------
>>> pd.CategoricalIndex(["a", "b", "c", "a", "b", "c"])
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
categories=['a', 'b', 'c'], ordered=False, dtype='category')
``CategoricalIndex`` can also be instantiated from a ``Categorical``:
>>> c = pd.Categorical(["a", "b", "c", "a", "b", "c"])
>>> pd.CategoricalIndex(c)
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
categories=['a', 'b', 'c'], ordered=False, dtype='category')
Ordered ``CategoricalIndex`` can have a min and max value.
>>> ci = pd.CategoricalIndex(
... ["a", "b", "c", "a", "b", "c"], ordered=True, categories=["c", "b", "a"]
... )
>>> ci
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
categories=['c', 'b', 'a'], ordered=True, dtype='category')
>>> ci.min()
'c' |
| |
- Method resolution order:
- CategoricalIndex
- pandas.core.indexes.extension.NDArrayBackedExtensionIndex
- pandas.core.indexes.extension.ExtensionIndex
- pandas.core.indexes.base.Index
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- builtins.object
Methods defined here:
- __contains__(self, key: 'Any') -> 'bool'
- Return a boolean indicating whether the provided key is in the index.
Parameters
----------
key : label
The key to check if it is present in the index.
Returns
-------
bool
Whether the key search is in the index.
Raises
------
TypeError
If the key is not hashable.
See Also
--------
Index.isin : Returns an ndarray of boolean dtype indicating whether the
list-like key is in the index.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> 2 in idx
True
>>> 6 in idx
False
- add_categories(self, *args, **kwargs)
- Add new categories.
`new_categories` will be included at the last/highest place in the
categories and will be unused directly after this call.
Parameters
----------
new_categories : category or list-like of category
The new categories to be included.
inplace : bool, default False
Whether or not to add the categories inplace or return a copy of
this categorical with added categories.
.. deprecated:: 1.3.0
Returns
-------
cat : Categorical or None
Categorical with new categories added or None if ``inplace=True``.
Raises
------
ValueError
If the new categories include old categories or do not validate as
categories
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
Examples
--------
>>> c = pd.Categorical(['c', 'b', 'c'])
>>> c
['c', 'b', 'c']
Categories (2, object): ['b', 'c']
>>> c.add_categories(['d', 'a'])
['c', 'b', 'c']
Categories (4, object): ['b', 'c', 'd', 'a']
- argsort(self, *args, **kwargs)
- Return the indices that would sort the Categorical.
.. versionchanged:: 0.25.0
Changed to sort missing values at the end.
Parameters
----------
ascending : bool, default True
Whether the indices should result in an ascending
or descending sort.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
Sorting algorithm.
**kwargs:
passed through to :func:`numpy.argsort`.
Returns
-------
np.ndarray[np.intp]
See Also
--------
numpy.ndarray.argsort
Notes
-----
While an ordering is applied to the category values, arg-sorting
in this context refers more to organizing and grouping together
based on matching category values. Thus, this function can be
called on an unordered Categorical instance unlike the functions
'Categorical.min' and 'Categorical.max'.
Examples
--------
>>> pd.Categorical(['b', 'b', 'a', 'c']).argsort()
array([2, 0, 1, 3])
>>> cat = pd.Categorical(['b', 'b', 'a', 'c'],
... categories=['c', 'b', 'a'],
... ordered=True)
>>> cat.argsort()
array([3, 0, 1, 2])
Missing values are placed at the end
>>> cat = pd.Categorical([2, None, 1])
>>> cat.argsort()
array([2, 0, 1])
- as_ordered(self, *args, **kwargs)
- Set the Categorical to be ordered.
Parameters
----------
inplace : bool, default False
Whether or not to set the ordered attribute in-place or return
a copy of this categorical with ordered set to True.
Returns
-------
Categorical or None
Ordered Categorical or None if ``inplace=True``.
- as_unordered(self, *args, **kwargs)
- Set the Categorical to be unordered.
Parameters
----------
inplace : bool, default False
Whether or not to set the ordered attribute in-place or return
a copy of this categorical with ordered set to False.
Returns
-------
Categorical or None
Unordered Categorical or None if ``inplace=True``.
- astype(self, dtype: 'Dtype', copy: 'bool' = True) -> 'Index'
- Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a TypeError exception is raised.
Parameters
----------
dtype : numpy dtype or pandas type
Note that any signed integer `dtype` is treated as ``'int64'``,
and any unsigned integer `dtype` is treated as ``'uint64'``,
regardless of the size.
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
Returns
-------
Index
Index with values cast to specified dtype.
- equals(self, other: 'object') -> 'bool'
- Determine if two CategoricalIndex objects contain the same elements.
Returns
-------
bool
If two CategoricalIndex objects have equal elements True,
otherwise False.
- is_dtype_equal(self, *args, **kwargs)
- # error: Incompatible redefinition (redefinition with type "Callable[[Any,
# VarArg(Any), KwArg(Any)], Any]", original type "property")
- map(self, mapper)
- Map values using input an input mapping or function.
Maps the values (their categories, not the codes) of the index to new
categories. If the mapping correspondence is one-to-one the result is a
:class:`~pandas.CategoricalIndex` which has the same order property as
the original, otherwise an :class:`~pandas.Index` is returned.
If a `dict` or :class:`~pandas.Series` is used any unmapped category is
mapped to `NaN`. Note that if this happens an :class:`~pandas.Index`
will be returned.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
Returns
-------
pandas.CategoricalIndex or pandas.Index
Mapped index.
See Also
--------
Index.map : Apply a mapping correspondence on an
:class:`~pandas.Index`.
Series.map : Apply a mapping correspondence on a
:class:`~pandas.Series`.
Series.apply : Apply more complex functions on a
:class:`~pandas.Series`.
Examples
--------
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'])
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
ordered=False, dtype='category')
>>> idx.map(lambda x: x.upper())
CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'],
ordered=False, dtype='category')
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})
CategoricalIndex(['first', 'second', 'third'], categories=['first',
'second', 'third'], ordered=False, dtype='category')
If the mapping is one-to-one the ordering of the categories is
preserved:
>>> idx = pd.CategoricalIndex(['a', 'b', 'c'], ordered=True)
>>> idx
CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'],
ordered=True, dtype='category')
>>> idx.map({'a': 3, 'b': 2, 'c': 1})
CategoricalIndex([3, 2, 1], categories=[3, 2, 1], ordered=True,
dtype='category')
If the mapping is not one-to-one an :class:`~pandas.Index` is returned:
>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')
If a `dict` is used, all unmapped categories are mapped to `NaN` and
the result is an :class:`~pandas.Index`:
>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', nan], dtype='object')
- max(self, *args, **kwargs)
- The maximum value of the object.
Only ordered `Categoricals` have a maximum!
.. versionchanged:: 1.0.0
Returns an NA value on empty arrays
Raises
------
TypeError
If the `Categorical` is not `ordered`.
Returns
-------
max : the maximum of this `Categorical`
- min(self, *args, **kwargs)
- The minimum value of the object.
Only ordered `Categoricals` have a minimum!
.. versionchanged:: 1.0.0
Returns an NA value on empty arrays
Raises
------
TypeError
If the `Categorical` is not `ordered`.
Returns
-------
min : the minimum of this `Categorical`
- reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> 'tuple[Index, npt.NDArray[np.intp] | None]'
- Create index with target's values (move/add/delete values as necessary)
Returns
-------
new_index : pd.Index
Resulting index
indexer : np.ndarray[np.intp] or None
Indices of output values in original index
- remove_categories(self, *args, **kwargs)
- Remove the specified categories.
`removals` must be included in the old categories. Values which were in
the removed categories will be set to NaN
Parameters
----------
removals : category or list of categories
The categories which should be removed.
inplace : bool, default False
Whether or not to remove the categories inplace or return a copy of
this categorical with removed categories.
.. deprecated:: 1.3.0
Returns
-------
cat : Categorical or None
Categorical with removed categories or None if ``inplace=True``.
Raises
------
ValueError
If the removals are not contained in the categories
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
Examples
--------
>>> c = pd.Categorical(['a', 'c', 'b', 'c', 'd'])
>>> c
['a', 'c', 'b', 'c', 'd']
Categories (4, object): ['a', 'b', 'c', 'd']
>>> c.remove_categories(['d', 'a'])
[NaN, 'c', 'b', 'c', NaN]
Categories (2, object): ['b', 'c']
- remove_unused_categories(self, *args, **kwargs)
- Remove categories which are not used.
Parameters
----------
inplace : bool, default False
Whether or not to drop unused categories inplace or return a copy of
this categorical with unused categories dropped.
.. deprecated:: 1.2.0
Returns
-------
cat : Categorical or None
Categorical with unused categories dropped or None if ``inplace=True``.
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
set_categories : Set the categories to the specified ones.
Examples
--------
>>> c = pd.Categorical(['a', 'c', 'b', 'c', 'd'])
>>> c
['a', 'c', 'b', 'c', 'd']
Categories (4, object): ['a', 'b', 'c', 'd']
>>> c[2] = 'a'
>>> c[4] = 'c'
>>> c
['a', 'c', 'a', 'c', 'c']
Categories (4, object): ['a', 'b', 'c', 'd']
>>> c.remove_unused_categories()
['a', 'c', 'a', 'c', 'c']
Categories (2, object): ['a', 'c']
- rename_categories(self, *args, **kwargs)
- Rename categories.
Parameters
----------
new_categories : list-like, dict-like or callable
New categories which will replace old categories.
* list-like: all items must be unique and the number of items in
the new categories must match the existing number of categories.
* dict-like: specifies a mapping from
old categories to new. Categories not contained in the mapping
are passed through and extra categories in the mapping are
ignored.
* callable : a callable that is called on all items in the old
categories and whose return values comprise the new categories.
inplace : bool, default False
Whether or not to rename the categories inplace or return a copy of
this categorical with renamed categories.
.. deprecated:: 1.3.0
Returns
-------
cat : Categorical or None
Categorical with removed categories or None if ``inplace=True``.
Raises
------
ValueError
If new categories are list-like and do not have the same number of
items than the current categories or do not validate as categories
See Also
--------
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
Examples
--------
>>> c = pd.Categorical(['a', 'a', 'b'])
>>> c.rename_categories([0, 1])
[0, 0, 1]
Categories (2, int64): [0, 1]
For dict-like ``new_categories``, extra keys are ignored and
categories not in the dictionary are passed through
>>> c.rename_categories({'a': 'A', 'c': 'C'})
['A', 'A', 'b']
Categories (2, object): ['A', 'b']
You may also provide a callable to create the new categories
>>> c.rename_categories(lambda x: x.upper())
['A', 'A', 'B']
Categories (2, object): ['A', 'B']
- reorder_categories(self, *args, **kwargs)
- Reorder categories as specified in new_categories.
`new_categories` need to include all old categories and no new category
items.
Parameters
----------
new_categories : Index-like
The categories in new order.
ordered : bool, optional
Whether or not the categorical is treated as a ordered categorical.
If not given, do not change the ordered information.
inplace : bool, default False
Whether or not to reorder the categories inplace or return a copy of
this categorical with reordered categories.
.. deprecated:: 1.3.0
Returns
-------
cat : Categorical or None
Categorical with removed categories or None if ``inplace=True``.
Raises
------
ValueError
If the new categories do not contain all old category items or any
new ones
See Also
--------
rename_categories : Rename categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
- searchsorted(self, *args, **kwargs)
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted array `self` (a) such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
Assuming that `self` is sorted:
====== ================================
`side` returned index `i` satisfies
====== ================================
left ``self[i-1] < value <= self[i]``
right ``self[i-1] <= value < self[i]``
====== ================================
Parameters
----------
value : array-like, list or scalar
Value(s) to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort array a into ascending
order. They are typically the result of argsort.
Returns
-------
array of ints or int
If value is array-like, array of insertion points.
If value is scalar, a single integer.
See Also
--------
numpy.searchsorted : Similar method from NumPy.
- set_categories(self, *args, **kwargs)
- Set the categories to the specified new_categories.
`new_categories` can include new categories (which will result in
unused categories) or remove old categories (which results in values
set to NaN). If `rename==True`, the categories will simple be renamed
(less or more items than in old categories will result in values set to
NaN or in unused categories respectively).
This method can be used to perform more than one action of adding,
removing, and reordering simultaneously and is therefore faster than
performing the individual steps via the more specialised methods.
On the other hand this methods does not do checks (e.g., whether the
old categories are included in the new categories on a reorder), which
can result in surprising changes, for example when using special string
dtypes, which does not considers a S1 string equal to a single char
python string.
Parameters
----------
new_categories : Index-like
The categories in new order.
ordered : bool, default False
Whether or not the categorical is treated as a ordered categorical.
If not given, do not change the ordered information.
rename : bool, default False
Whether or not the new_categories should be considered as a rename
of the old categories or as reordered categories.
inplace : bool, default False
Whether or not to reorder the categories in-place or return a copy
of this categorical with reordered categories.
.. deprecated:: 1.3.0
Returns
-------
Categorical with reordered categories or None if inplace.
Raises
------
ValueError
If new_categories does not validate as categories
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
- take_nd(self, *args, **kwargs)
- Alias for `take`
- tolist(self, *args, **kwargs)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
Static methods defined here:
- __new__(cls, data=None, categories=None, ordered=None, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None) -> 'CategoricalIndex'
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- inferred_type
- Return a string of the type inferred from the values.
Data descriptors defined here:
- categories
- The categories of this categorical.
Setting assigns new values to each category (effectively a rename of
each individual category).
The assigned value has to be a list-like object. All items must be
unique and the number of items in the new categories must be the same
as the number of items in the old categories.
Assigning to `categories` is a inplace operation!
Raises
------
ValueError
If the new categories do not validate as categories or if the
number of new categories is unequal the number of old categories
See Also
--------
rename_categories : Rename categories.
reorder_categories : Reorder categories.
add_categories : Add new categories.
remove_categories : Remove the specified categories.
remove_unused_categories : Remove categories which are not used.
set_categories : Set the categories to the specified ones.
- codes
- The category codes of this categorical.
Codes are an array of integers which are the positions of the actual
values in the categories array.
There is no setter, use the other categorical methods and the normal item
setter to change values in the categorical.
Returns
-------
ndarray[int]
A non-writable view of the `codes` array.
- ordered
- Whether the categories have an ordered relationship.
Data and other attributes defined here:
- __annotations__ = {'_data': 'Categorical', '_values': 'Categorical', 'categories': 'Index', 'codes': 'np.ndarray', 'ordered': 'bool | None'}
Methods inherited from pandas.core.indexes.base.Index:
- __abs__(self)
- __and__(self, other)
- __array__(self, dtype=None) -> 'np.ndarray'
- The array interface, return my values.
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str_t', *inputs, **kwargs)
- __array_wrap__(self, result, context=None)
- Gets called after a ufunc and other functions e.g. np.split.
- __bool__ = __nonzero__(self)
- __copy__(self: '_IndexT', **kwargs) -> '_IndexT'
- __deepcopy__(self: '_IndexT', memo=None) -> '_IndexT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __getitem__(self, key)
- Override numpy.ndarray's __getitem__ method to work as desired.
This function adds lists and Series as valid boolean indexers
(ndarrays only supports ndarray with dtype=bool).
If resulting ndim != 1, plain ndarray is returned instead of
corresponding `Index` subclass.
- __iadd__(self, other)
- __invert__(self)
- __len__(self) -> 'int'
- Return the length of the Index.
- __neg__(self)
- __nonzero__(self)
- __or__(self, other)
- __pos__(self)
- __reduce__(self)
- Helper for pickle.
- __repr__(self) -> 'str_t'
- Return a string representation for this object.
- __setitem__(self, key, value)
- __xor__(self, other)
- all(self, *args, **kwargs)
- Return whether all elements are Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
all : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.any : Return whether any element in an Index is True.
Series.any : Return whether any element in a Series is True.
Series.all : Return whether all elements in a Series are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because ``0`` is considered False.
>>> pd.Index([0, 1, 2]).all()
False
- any(self, *args, **kwargs)
- Return whether any element is Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
any : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.all : Return whether all elements are True.
Series.all : Return whether all elements are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
>>> index = pd.Index([0, 1, 2])
>>> index.any()
True
>>> index = pd.Index([0, 0, 0])
>>> index.any()
False
- append(self, other: 'Index | Sequence[Index]') -> 'Index'
- Append a collection of Index options together.
Parameters
----------
other : Index or list/tuple of indices
Returns
-------
Index
- argmax(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- asof(self, label)
- Return the label from the index, or, if not present, the previous one.
Assuming that the index is sorted, return the passed index label if it
is in the index, or return the previous index label if the passed one
is not in the index.
Parameters
----------
label : object
The label up to which the method returns the latest index label.
Returns
-------
object
The passed label if it is in the index. The previous label if the
passed label is not in the sorted index or `NaN` if there is no
such label.
See Also
--------
Series.asof : Return the latest value in a Series up to the
passed index.
merge_asof : Perform an asof merge (similar to left join but it
matches on nearest key rather than equal key).
Index.get_loc : An `asof` is a thin wrapper around `get_loc`
with method='pad'.
Examples
--------
`Index.asof` returns the latest index label up to the passed label.
>>> idx = pd.Index(['2013-12-31', '2014-01-02', '2014-01-03'])
>>> idx.asof('2014-01-01')
'2013-12-31'
If the label is in the index, the method returns the passed label.
>>> idx.asof('2014-01-02')
'2014-01-02'
If all of the labels in the index are later than the passed label,
NaN is returned.
>>> idx.asof('1999-01-02')
nan
If the index is not sorted, an error is raised.
>>> idx_not_sorted = pd.Index(['2013-12-31', '2015-01-02',
... '2014-01-03'])
>>> idx_not_sorted.asof('2013-12-31')
Traceback (most recent call last):
ValueError: index must be monotonic increasing or decreasing
- asof_locs(self, where: 'Index', mask: 'np.ndarray') -> 'npt.NDArray[np.intp]'
- Return the locations (indices) of labels in the index.
As in the `asof` function, if the label (a particular entry in
`where`) is not in the index, the latest index label up to the
passed label is chosen and its index returned.
If all of the labels in the index are later than a label in `where`,
-1 is returned.
`mask` is used to ignore NA values in the index during calculation.
Parameters
----------
where : Index
An Index consisting of an array of timestamps.
mask : np.ndarray[bool]
Array of booleans denoting where values in the original
data are not NA.
Returns
-------
np.ndarray[np.intp]
An array of locations (indices) of the labels from the Index
which correspond to the return values of the `asof` function
for every element in `where`.
- copy(self: '_IndexT', name: 'Hashable | None' = None, deep: 'bool' = False, dtype: 'Dtype | None' = None, names: 'Sequence[Hashable] | None' = None) -> '_IndexT'
- Make a copy of this object.
Name and dtype sets those attributes on the new object.
Parameters
----------
name : Label, optional
Set name for new object.
deep : bool, default False
dtype : numpy dtype or pandas type, optional
Set dtype for new object.
.. deprecated:: 1.2.0
use ``astype`` method instead.
names : list-like, optional
Kept for compatibility with MultiIndex. Should not be used.
.. deprecated:: 1.4.0
use ``name`` instead.
Returns
-------
Index
Index refer to new object which is a copy of this object.
Notes
-----
In most cases, there should be no functional difference from using
``deep``, but if ``deep`` is passed it will attempt to deepcopy.
- delete(self: '_IndexT', loc) -> '_IndexT'
- Make new Index with passed location(-s) deleted.
Parameters
----------
loc : int or list of int
Location of item(-s) which will be deleted.
Use a list of locations to delete more than one value at the same time.
Returns
-------
Index
Will be same type as self, except for RangeIndex.
See Also
--------
numpy.delete : Delete any rows and column from NumPy array (ndarray).
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete(1)
Index(['a', 'c'], dtype='object')
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete([0, 2])
Index(['b'], dtype='object')
- difference(self, other, sort=None)
- Return a new Index with elements of index not in `other`.
This is the set difference of two Index objects.
Parameters
----------
other : Index or array-like
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
difference : Index
Examples
--------
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
- drop(self, labels, errors: 'str_t' = 'raise') -> 'Index'
- Make new Index with passed list of labels deleted.
Parameters
----------
labels : array-like or scalar
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and existing labels are dropped.
Returns
-------
dropped : Index
Will be same type as self, except for RangeIndex.
Raises
------
KeyError
If not all of the labels are found in the selected axis
- drop_duplicates(self: '_IndexT', keep: 'str_t | bool' = 'first') -> '_IndexT'
- Return Index with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
Returns
-------
deduplicated : Index
See Also
--------
Series.drop_duplicates : Equivalent method on Series.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Index.duplicated : Related method on Index, indicating duplicate
Index values.
Examples
--------
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The `keep` parameter controls which duplicate values are removed.
The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value 'last' keeps the last occurrence for each set of duplicated
entries.
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value ``False`` discards all sets of duplicated entries.
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
- droplevel(self, level=0)
- Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex.
Parameters
----------
level : int, str, or list-like, default 0
If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
Returns
-------
Index or MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.droplevel()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
>>> mi.droplevel(2)
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel('z')
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel(['x', 'y'])
Int64Index([5, 6], dtype='int64', name='z')
- dropna(self: '_IndexT', how: 'str_t' = 'any') -> '_IndexT'
- Return Index without NA/NaN values.
Parameters
----------
how : {'any', 'all'}, default 'any'
If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
Returns
-------
Index
- duplicated(self, keep: "Literal[('first', 'last', False)]" = 'first') -> 'npt.NDArray[np.bool_]'
- Indicate duplicate index values.
Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first, or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
The value or values in a set of duplicates to mark as missing.
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
np.ndarray[bool]
See Also
--------
Series.duplicated : Equivalent method on pandas.Series.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Index.drop_duplicates : Remove duplicate values from Index.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set to False and all others to True:
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first')
array([False, False, True, False, True])
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> idx.duplicated(keep='last')
array([ True, False, True, False, False])
By setting keep on ``False``, all duplicates are True:
>>> idx.duplicated(keep=False)
array([ True, False, True, False, True])
- fillna(self, value=None, downcast=None)
- Fill NA/NaN values with the specified value.
Parameters
----------
value : scalar
Scalar value to use to fill holes (e.g. 0).
This value cannot be a list-likes.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
Index
See Also
--------
DataFrame.fillna : Fill NaN values of a DataFrame.
Series.fillna : Fill NaN Values of a Series.
- format(self, name: 'bool' = False, formatter: 'Callable | None' = None, na_rep: 'str_t' = 'NaN') -> 'list[str_t]'
- Render a string representation of the Index.
- get_indexer(self, target, method: 'str_t | None' = None, limit: 'int | None' = None, tolerance=None) -> 'npt.NDArray[np.intp]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
Notes
-----
Returns -1 for unmatched values, for further explanation see the
example below.
Examples
--------
>>> index = pd.Index(['c', 'a', 'b'])
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1, 2, -1])
Notice that the return value is an array of locations in ``index``
and ``x`` is marked by -1, as it is not in ``index``.
- get_indexer_for(self, target) -> 'npt.NDArray[np.intp]'
- Guaranteed return of an indexer even when non-unique.
This dispatches to get_indexer or get_indexer_non_unique
as appropriate.
Returns
-------
np.ndarray[np.intp]
List of indices.
Examples
--------
>>> idx = pd.Index([np.nan, 'var1', np.nan])
>>> idx.get_indexer_for([np.nan])
array([0, 2])
- get_indexer_non_unique(self, target) -> 'tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
missing : np.ndarray[np.intp]
An indexer into the target of the values not found.
These correspond to the -1 in the indexer array.
- get_level_values = _get_level_values(self, level) -> 'Index'
- get_loc(self, key, method=None, tolerance=None)
- Get integer location, slice or boolean mask for requested label.
Parameters
----------
key : label
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
tolerance : int or float, optional
Maximum distance from index value for inexact matches. The value of
the index at the matching location must satisfy the equation
``abs(index[loc] - key) <= tolerance``.
Returns
-------
loc : int if unique index, slice if monotonic index, else mask
Examples
--------
>>> unique_index = pd.Index(list('abc'))
>>> unique_index.get_loc('b')
1
>>> monotonic_index = pd.Index(list('abbc'))
>>> monotonic_index.get_loc('b')
slice(1, 3, None)
>>> non_monotonic_index = pd.Index(list('abcb'))
>>> non_monotonic_index.get_loc('b')
array([False, True, False, True])
- get_slice_bound(self, label, side: "Literal[('left', 'right')]", kind=<no_default>) -> 'int'
- Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position
of given label.
Parameters
----------
label : object
side : {'left', 'right'}
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
int
Index of label.
- get_value(self, series: 'Series', key)
- Fast lookup of value from 1-dimensional ndarray.
Only use this if you know what you're doing.
Returns
-------
scalar or Series
- groupby(self, values) -> 'PrettyDict[Hashable, np.ndarray]'
- Group the index labels by a given array of values.
Parameters
----------
values : array
Values used to determine the groups.
Returns
-------
dict
{group name -> group labels}
- holds_integer(self) -> 'bool'
- Whether the type is an integer type.
- identical(self, other) -> 'bool'
- Similar to equals, but checks that object attributes and types are also equal.
Returns
-------
bool
If two Index objects have equal elements and same type True,
otherwise False.
- insert(self, loc: 'int', item) -> 'Index'
- Make new Index inserting new item at location.
Follows Python numpy.insert semantics for negative values.
Parameters
----------
loc : int
item : object
Returns
-------
new_index : Index
- intersection(self, other, sort=False)
- Form the intersection of two Index objects.
This returns a new Index with elements common to the index and `other`.
Parameters
----------
other : Index or array-like
sort : False or None, default False
Whether to sort the resulting index.
* False : do not sort the result.
* None : sort the result, except when `self` and `other` are equal
or when the values cannot be compared.
Returns
-------
intersection : Index
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
- is_(self, other) -> 'bool'
- More flexible, faster check like ``is`` but that works through views.
Note: this is *not* the same as ``Index.identical()``, which checks
that metadata is also the same.
Parameters
----------
other : object
Other object to compare against.
Returns
-------
bool
True if both have same underlying data, False otherwise.
See Also
--------
Index.identical : Works like ``Index.is_`` but also checks metadata.
- is_boolean(self) -> 'bool'
- Check if the Index only consists of booleans.
Returns
-------
bool
Whether or not the Index only consists of booleans.
See Also
--------
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
- is_categorical(self) -> 'bool'
- Check if the Index holds categorical data.
Returns
-------
bool
True if the Index is categorical.
See Also
--------
CategoricalIndex : Index for categorical data.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = pd.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0 Peter
1 Victor
2 Elisabeth
3 Mar
dtype: object
>>> s.index.is_categorical()
False
- is_floating(self) -> 'bool'
- Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats,
integers, or NaNs.
Returns
-------
bool
Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
- is_integer(self) -> 'bool'
- Check if the Index only consists of integers.
Returns
-------
bool
Whether or not the Index only consists of integers.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
- is_interval(self) -> 'bool'
- Check if the Index holds Interval objects.
Returns
-------
bool
Whether or not the Index holds Interval objects.
See Also
--------
IntervalIndex : Index for Interval objects.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
- is_mixed(self) -> 'bool'
- Check if the Index holds data with mixed data types.
Returns
-------
bool
Whether or not the Index holds data with mixed data types.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
- is_numeric(self) -> 'bool'
- Check if the Index only consists of numeric data.
Returns
-------
bool
Whether or not the Index only consists of numeric data.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
- is_object(self) -> 'bool'
- Check if the Index is of the object dtype.
Returns
-------
bool
Whether or not the Index is of the object dtype.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
- is_type_compatible(self, kind: 'str_t') -> 'bool'
- Whether the index type is compatible with the provided type.
- isin(self, values, level=None) -> 'np.ndarray'
- Return a boolean array where the index values are in `values`.
Compute boolean array of whether each index value is found in the
passed set of values. The length of the returned boolean array matches
the length of the index.
Parameters
----------
values : set or list-like
Sought values.
level : str or int, optional
Name or position of the index level to use (if the index is a
`MultiIndex`).
Returns
-------
np.ndarray[bool]
NumPy array of boolean values.
See Also
--------
Series.isin : Same for Series.
DataFrame.isin : Same method for DataFrames.
Notes
-----
In the case of `MultiIndex` you must either specify `values` as a
list-like object containing tuples that are the same length as the
number of levels, or specify `level`. Otherwise it will raise a
``ValueError``.
If `level` is specified:
- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.
Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4])
array([ True, False, False])
>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red', 'blue', 'green']],
... names=('number', 'color'))
>>> midx
MultiIndex([(1, 'red'),
(2, 'blue'),
(3, 'green')],
names=['number', 'color'])
Check whether the strings in the 'color' level of the MultiIndex
are in a list of colors.
>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])
To check across the levels of a MultiIndex, pass a list of tuples:
>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
For a DatetimeIndex, string values in `values` are converted to
Timestamps.
>>> dates = ['2000-03-11', '2000-03-12', '2000-03-13']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],
dtype='datetime64[ns]', freq=None)
>>> dti.isin(['2000-03-11'])
array([ True, False, False])
- isna(self) -> 'npt.NDArray[np.bool_]'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`pd.NaT`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values. Characters such as
empty strings `''` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
numpy.ndarray[bool]
A boolean array of whether my values are NA.
See Also
--------
Index.notna : Boolean inverse of isna.
Index.dropna : Omit entries with missing values.
isna : Top-level isna.
Series.isna : Detect missing values in Series object.
Examples
--------
Show which entries in a pandas.Index are NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.isna()
array([False, False, True])
Empty strings are not considered NA values. None is considered an NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.isna()
array([False, False, False, True])
For datetimes, `NaT` (Not a Time) is considered as an NA value.
>>> idx = pd.DatetimeIndex([pd.Timestamp('1940-04-25'),
... pd.Timestamp(''), None, pd.NaT])
>>> idx
DatetimeIndex(['1940-04-25', 'NaT', 'NaT', 'NaT'],
dtype='datetime64[ns]', freq=None)
>>> idx.isna()
array([False, True, True, True])
- isnull = isna(self) -> 'npt.NDArray[np.bool_]'
- join(self, other, how: 'str_t' = 'left', level=None, return_indexers: 'bool' = False, sort: 'bool' = False)
- Compute join_index and indexers to conform data
structures to the new index.
Parameters
----------
other : Index
how : {'left', 'right', 'inner', 'outer'}
level : int or level name, default None
return_indexers : bool, default False
sort : bool, default False
Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword).
Returns
-------
join_index, (left_indexer, right_indexer)
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of the values.
Parameters
----------
deep : bool, default False
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption.
Returns
-------
bytes used
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False or if used on PyPy
- notna(self) -> 'npt.NDArray[np.bool_]'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.
Returns
-------
numpy.ndarray[bool]
Boolean array to indicate which entries are not NA.
See Also
--------
Index.notnull : Alias of notna.
Index.isna: Inverse of notna.
notna : Top-level notna.
Examples
--------
Show which entries in an Index are not NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.notna()
array([ True, True, False])
Empty strings are not considered NA values. None is considered a NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.notna()
array([ True, True, True, False])
- notnull = notna(self) -> 'npt.NDArray[np.bool_]'
- putmask(self, mask, value) -> 'Index'
- Return a new Index of the values set with the mask.
Returns
-------
Index
See Also
--------
numpy.ndarray.putmask : Changes elements of an array
based on conditional and input values.
- ravel(self, order='C')
- Return an ndarray of the flattened values of the underlying data.
Returns
-------
numpy.ndarray
Flattened array.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- rename(self, name, inplace=False)
- Alter Index or MultiIndex name.
Able to set new names without level. Defaults to returning new index.
Length of names must match number of levels in MultiIndex.
Parameters
----------
name : label or list of labels
Name(s) to set.
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.set_names : Able to set new names partially and by level.
Examples
--------
>>> idx = pd.Index(['A', 'C', 'A', 'B'], name='score')
>>> idx.rename('grade')
Index(['A', 'C', 'A', 'B'], dtype='object', name='grade')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]],
... names=['kind', 'year'])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.rename(['species', 'year'])
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
>>> idx.rename('species')
Traceback (most recent call last):
TypeError: Must pass list-like as `names`.
- repeat(self, repeats, axis=None)
- Repeat elements of a Index.
Returns a new Index where each element of the current Index
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Index.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
repeated_index : Index
Newly created Index with repeated elements.
See Also
--------
Series.repeat : Equivalent function for Series.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2)
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3])
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
- set_names(self, names, level=None, inplace: 'bool' = False)
- Set Index or MultiIndex name.
Able to set new names partially and by level.
Parameters
----------
names : label or list of label or dict-like for MultiIndex
Name(s) to set.
.. versionchanged:: 1.3.0
level : int, label or list of int or label, optional
If the index is a MultiIndex and names is not dict-like, level(s) to set
(None for all levels). Otherwise level must be None.
.. versionchanged:: 1.3.0
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.rename : Able to set new names without level.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
)
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.set_names('species', level=0)
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
When renaming levels with a dict, levels can not be passed.
>>> idx.set_names({'kind': 'snake'})
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['snake', 'year'])
- set_value(self, arr, key, value)
- Fast lookup of value from 1-dimensional ndarray.
.. deprecated:: 1.0
Notes
-----
Only use this if you know what you're doing.
- shift(self, periods=1, freq=None)
- Shift index by desired number of time frequency increments.
This method is for shifting the values of datetime-like indexes
by a specified time increment a given number of times.
Parameters
----------
periods : int, default 1
Number of periods (or increments) to shift by,
can be positive or negative.
freq : pandas.DateOffset, pandas.Timedelta or str, optional
Frequency increment to shift by.
If None, the index is shifted by its own `freq` attribute.
Offset aliases are valid strings, e.g., 'D', 'W', 'M' etc.
Returns
-------
pandas.Index
Shifted index.
See Also
--------
Series.shift : Shift values of Series.
Notes
-----
This method is only implemented for datetime-like index classes,
i.e., DatetimeIndex, PeriodIndex and TimedeltaIndex.
Examples
--------
Put the first 5 month starts of 2011 into an index.
>>> month_starts = pd.date_range('1/1/2011', periods=5, freq='MS')
>>> month_starts
DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01', '2011-04-01',
'2011-05-01'],
dtype='datetime64[ns]', freq='MS')
Shift the index by 10 days.
>>> month_starts.shift(10, freq='D')
DatetimeIndex(['2011-01-11', '2011-02-11', '2011-03-11', '2011-04-11',
'2011-05-11'],
dtype='datetime64[ns]', freq=None)
The default value of `freq` is the `freq` attribute of the index,
which is 'MS' (month start) in this example.
>>> month_starts.shift(10)
DatetimeIndex(['2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
'2012-03-01'],
dtype='datetime64[ns]', freq='MS')
- slice_indexer(self, start: 'Hashable | None' = None, end: 'Hashable | None' = None, step: 'int | None' = None, kind=<no_default>) -> 'slice'
- Compute the slice indexer for input labels and step.
Index needs to be ordered and unique.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, default None
kind : str, default None
.. deprecated:: 1.4.0
Returns
-------
indexer : slice
Raises
------
KeyError : If key does not exist, or key is not unique and index is
not ordered.
Notes
-----
This function assumes that the data is sorted, so use at your own peril
Examples
--------
This is a method on all index types. For example you can do:
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_indexer(start='b', end='c')
slice(1, 3, None)
>>> idx = pd.MultiIndex.from_arrays([list('abcd'), list('efgh')])
>>> idx.slice_indexer(start='b', end=('c', 'g'))
slice(1, 3, None)
- slice_locs(self, start=None, end=None, step=None, kind=<no_default>) -> 'tuple[int, int]'
- Compute slice locations for input labels.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, defaults None
If None, defaults to 1.
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
start, end : int
See Also
--------
Index.get_loc : Get location for a single label.
Notes
-----
This method only works if the index is monotonic or unique.
Examples
--------
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_locs(start='b', end='c')
(1, 3)
- sort(self, *args, **kwargs)
- Use sort_values instead.
- sort_values(self, return_indexer: 'bool' = False, ascending: 'bool' = True, na_position: 'str_t' = 'last', key: 'Callable | None' = None)
- Return a sorted copy of the index.
Return a sorted copy of the index, and optionally return the indices
that sorted the index itself.
Parameters
----------
return_indexer : bool, default False
Should the indices that would sort the index be returned.
ascending : bool, default True
Should the index values be sorted in an ascending order.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
.. versionadded:: 1.2.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
sorted_index : pandas.Index
Sorted copy of the index.
indexer : numpy.ndarray, optional
The indices that the index itself was sorted by.
See Also
--------
Series.sort_values : Sort values of a Series.
DataFrame.sort_values : Sort values in a DataFrame.
Examples
--------
>>> idx = pd.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices `idx` was
sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
- sortlevel(self, level=None, ascending=True, sort_remaining=None)
- For internal compatibility with the Index API.
Sort the Index. This is for compat with MultiIndex
Parameters
----------
ascending : bool, default True
False to sort in descending order
level, sort_remaining are compat parameters
Returns
-------
Index
- symmetric_difference(self, other, result_name=None, sort=None)
- Compute the symmetric difference of two Index objects.
Parameters
----------
other : Index or array-like
result_name : str
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
symmetric_difference : Index
Notes
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([2, 3, 4, 5])
>>> idx1.symmetric_difference(idx2)
Int64Index([1, 5], dtype='int64')
- take(self, indices, axis: 'int' = 0, allow_fill: 'bool' = True, fill_value=None, **kwargs)
- Return a new Index of the values selected by the indices.
For internal compatibility with numpy arrays.
Parameters
----------
indices : array-like
Indices to be taken.
axis : int, optional
The axis over which to select values, always 0.
allow_fill : bool, default True
fill_value : scalar, default None
If allow_fill=True and fill_value is not None, indices specified by
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
Returns
-------
Index
An index formed of elements at the given indices. Will be the same
type as self, except for RangeIndex.
See Also
--------
numpy.ndarray.take: Return an array formed from the
elements of a at the given indices.
- to_flat_index(self)
- Identity method.
This is implemented for compatibility with subclass implementations
when chaining.
Returns
-------
pd.Index
Caller.
See Also
--------
MultiIndex.to_flat_index : Subclass implementation.
- to_frame(self, index: 'bool' = True, name: 'Hashable' = <no_default>) -> 'DataFrame'
- Create a DataFrame with a column containing the Index.
Parameters
----------
index : bool, default True
Set the index of the returned DataFrame as the original Index.
name : object, default None
The passed name should substitute for the index name (if it has
one).
Returns
-------
DataFrame
DataFrame containing the original Index data.
See Also
--------
Index.to_series : Convert an Index to a Series.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
animal
animal
Ant Ant
Bear Bear
Cow Cow
By default, the original Index is reused. To enforce a new Index:
>>> idx.to_frame(index=False)
animal
0 Ant
1 Bear
2 Cow
To override the name of the resulting column, specify `name`:
>>> idx.to_frame(index=False, name='zoo')
zoo
0 Ant
1 Bear
2 Cow
- to_native_types(self, slicer=None, **kwargs) -> 'np.ndarray'
- Format specified values of `self` and return them.
.. deprecated:: 1.2.0
Parameters
----------
slicer : int, array-like
An indexer into `self` that specifies which values
are used in the formatting process.
kwargs : dict
Options for specifying how the values should be formatted.
These options include the following:
1) na_rep : str
The value that serves as a placeholder for NULL values
2) quoting : bool or None
Whether or not there are quoted values in `self`
3) date_format : str
The format used to represent date-like values.
Returns
-------
numpy.ndarray
Formatted values.
- to_series(self, index=None, name: 'Hashable' = None) -> 'Series'
- Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.
Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.
Returns
-------
Series
The dtype will be based on the type of the Index values.
See Also
--------
Index.to_frame : Convert an Index to a DataFrame.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
By default, the original Index and original name is reused.
>>> idx.to_series()
animal
Ant Ant
Bear Bear
Cow Cow
Name: animal, dtype: object
To enforce a new Index, specify new labels to ``index``:
>>> idx.to_series(index=[0, 1, 2])
0 Ant
1 Bear
2 Cow
Name: animal, dtype: object
To override the name of the resulting column, specify `name`:
>>> idx.to_series(name='zoo')
animal
Ant Ant
Bear Bear
Cow Cow
Name: zoo, dtype: object
- union(self, other, sort=None)
- Form the union of two Index objects.
If the Index objects are incompatible, both Index objects will be
cast to dtype('object') first.
.. versionchanged:: 0.25.0
Parameters
----------
other : Index or array-like
sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.
* False : do not sort the result.
Returns
-------
union : Index
Examples
--------
Union matching dtypes
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union mismatched dtypes
>>> idx1 = pd.Index(['a', 'b', 'c', 'd'])
>>> idx2 = pd.Index([1, 2, 3, 4])
>>> idx1.union(idx2)
Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
MultiIndex case
>>> idx1 = pd.MultiIndex.from_arrays(
... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
... )
>>> idx1
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue')],
)
>>> idx2 = pd.MultiIndex.from_arrays(
... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
... )
>>> idx2
MultiIndex([(3, 'Red'),
(3, 'Green'),
(2, 'Red'),
(2, 'Green')],
)
>>> idx1.union(idx2)
MultiIndex([(1, 'Blue'),
(1, 'Red'),
(2, 'Blue'),
(2, 'Green'),
(2, 'Red'),
(3, 'Green'),
(3, 'Red')],
)
>>> idx1.union(idx2, sort=False)
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue'),
(3, 'Red'),
(3, 'Green'),
(2, 'Green')],
)
- unique(self: '_IndexT', level: 'Hashable | None' = None) -> '_IndexT'
- Return unique values in the index.
Unique values are returned in order of appearance, this does NOT sort.
Parameters
----------
level : int or hashable, optional
Only return values from specified level (for MultiIndex).
If int, gets the level by integer position, else by level name.
Returns
-------
Index
See Also
--------
unique : Numpy array of unique values in that column.
Series.unique : Return unique values of Series object.
- view(self, cls=None)
- where(self, cond, other=None) -> 'Index'
- Replace values where the condition is False.
The replacement is taken from other.
Parameters
----------
cond : bool array-like with the same length as self
Condition to select the values on.
other : scalar, or array-like, default None
Replacement if the condition is False.
Returns
-------
pandas.Index
A copy of self with values replaced from other
where the condition is False.
See Also
--------
Series.where : Same method for Series.
DataFrame.where : Same method for DataFrame.
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.where(idx.isin(['car', 'train']), 'other')
Index(['car', 'other', 'train', 'other'], dtype='object')
Readonly properties inherited from pandas.core.indexes.base.Index:
- asi8
- Integer representation of the values.
Returns
-------
ndarray
An ndarray with int64 dtype.
- has_duplicates
- Check if the Index has duplicate values.
Returns
-------
bool
Whether or not the Index has duplicate values.
Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
- is_monotonic
- Alias for is_monotonic_increasing.
- is_monotonic_decreasing
- Return if the index is monotonic decreasing (only equal or
decreasing) values.
Examples
--------
>>> Index([3, 2, 1]).is_monotonic_decreasing
True
>>> Index([3, 2, 2]).is_monotonic_decreasing
True
>>> Index([3, 1, 2]).is_monotonic_decreasing
False
- is_monotonic_increasing
- Return if the index is monotonic increasing (only equal or
increasing) values.
Examples
--------
>>> Index([1, 2, 3]).is_monotonic_increasing
True
>>> Index([1, 2, 2]).is_monotonic_increasing
True
>>> Index([1, 3, 2]).is_monotonic_increasing
False
- nlevels
- Number of levels.
- shape
- Return a tuple of the shape of the underlying data.
- values
- Return an array representing the data in the Index.
.. warning::
We recommend using :attr:`Index.array` or
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
array: numpy.ndarray or ExtensionArray
See Also
--------
Index.array : Reference to the underlying data.
Index.to_numpy : A NumPy array representing the underlying data.
Data descriptors inherited from pandas.core.indexes.base.Index:
- array
- The ExtensionArray of the data backing this Series or Index.
Returns
-------
ExtensionArray
An ExtensionArray of the values stored within. For extension
types, this is the actual array. For NumPy native types, this
is a thin (no copy) wrapper around :class:`numpy.ndarray`.
``.array`` differs ``.values`` which may require converting the
data to a different form.
See Also
--------
Index.to_numpy : Similar method that always returns a NumPy array.
Series.to_numpy : Similar method that always returns a NumPy array.
Notes
-----
This table lays out the different array types for each extension
dtype within pandas.
================== =============================
dtype array type
================== =============================
category Categorical
period PeriodArray
interval IntervalArray
IntegerNA IntegerArray
string StringArray
boolean BooleanArray
datetime64[ns, tz] DatetimeArray
================== =============================
For any 3rd-party extension types, the array type will be an
ExtensionArray.
For all remaining dtypes ``.array`` will be a
:class:`arrays.NumpyExtensionArray` wrapping the actual ndarray
stored within. If you absolutely need a NumPy array (possibly with
copying / coercing data), then use :meth:`Series.to_numpy` instead.
Examples
--------
For regular NumPy types like int, and float, a PandasArray
is returned.
>>> pd.Series([1, 2, 3]).array
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray
is returned
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
- dtype
- Return the dtype object of the underlying data.
- hasnans
- Return True if there are any NaNs.
Enables various performance speedups.
- is_all_dates
- Whether or not the index values only consist of dates.
- is_unique
- Return if the index has unique values.
- name
- Return Index or MultiIndex name.
- names
Data and other attributes inherited from pandas.core.indexes.base.Index:
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1)
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- to_list = tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- nbytes
- Return the number of bytes in the underlying data.
- ndim
- Number of dimensions of the underlying data, by definition 1.
- size
- Return the number of elements in the underlying data.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __divmod__(self, other)
- __eq__(self, other)
- Return self==value.
- __floordiv__(self, other)
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rdivmod__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class DataFrame(pandas.core.generic.NDFrame, pandas.core.arraylike.OpsMixin) |
| |
DataFrame(data=None, index: 'Axes | None' = None, columns: 'Axes | None' = None, dtype: 'Dtype | None' = None, copy: 'bool | None' = None)
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.
Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, dataclass or list-like objects. If
data is a dict, column order follows insertion-order. If a dict contains Series
which have an index defined, it is aligned by its index.
.. versionchanged:: 0.25.0
If data is a list of dicts, column order follows insertion-order.
index : Index or array-like
Index to use for resulting frame. Will default to RangeIndex if
no indexing information part of input data and no index provided.
columns : Index or array-like
Column labels to use for resulting frame when data does not have them,
defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,
will perform column selection instead.
dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer.
copy : bool or None, default None
Copy data from inputs.
For dict data, the default of None behaves like ``copy=True``. For DataFrame
or 2d ndarray input, the default of None behaves like ``copy=False``.
.. versionchanged:: 1.3.0
See Also
--------
DataFrame.from_records : Constructor from tuples, also record arrays.
DataFrame.from_dict : From dicts of Series, arrays, or dicts.
read_csv : Read a comma-separated values (csv) file into DataFrame.
read_table : Read general delimited file into DataFrame.
read_clipboard : Read text from clipboard into DataFrame.
Examples
--------
Constructing DataFrame from a dictionary.
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 1 3
1 2 4
Notice that the inferred dtype is int64.
>>> df.dtypes
col1 int64
col2 int64
dtype: object
To enforce a single dtype:
>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1 int8
col2 int8
dtype: object
Constructing DataFrame from a dictionary including Series:
>>> d = {'col1': [0, 1, 2, 3], 'col2': pd.Series([2, 3], index=[2, 3])}
>>> pd.DataFrame(data=d, index=[0, 1, 2, 3])
col1 col2
0 0 NaN
1 1 NaN
2 2 2.0
3 3 3.0
Constructing DataFrame from numpy ndarray:
>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
... columns=['a', 'b', 'c'])
>>> df2
a b c
0 1 2 3
1 4 5 6
2 7 8 9
Constructing DataFrame from a numpy ndarray that has labeled columns:
>>> data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
... dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
>>> df3 = pd.DataFrame(data, columns=['c', 'a'])
...
>>> df3
c a
0 3 1
1 6 4
2 9 7
Constructing DataFrame from dataclass:
>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
x y
0 0 0
1 0 3
2 2 3 |
| |
- Method resolution order:
- DataFrame
- pandas.core.generic.NDFrame
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- pandas.core.indexing.IndexingMixin
- pandas.core.arraylike.OpsMixin
- builtins.object
Methods defined here:
- __divmod__(self, other) -> 'tuple[DataFrame, DataFrame]'
- __getitem__(self, key)
- __init__(self, data=None, index: 'Axes | None' = None, columns: 'Axes | None' = None, dtype: 'Dtype | None' = None, copy: 'bool | None' = None)
- Initialize self. See help(type(self)) for accurate signature.
- __len__(self) -> 'int'
- Returns length of info axis, but here we use the index.
- __matmul__(self, other: 'AnyArrayLike | DataFrame | Series') -> 'DataFrame | Series'
- Matrix multiplication using binary `@` operator in Python>=3.5.
- __rdivmod__(self, other) -> 'tuple[DataFrame, DataFrame]'
- __repr__(self) -> 'str'
- Return a string representation for a particular DataFrame.
- __rmatmul__(self, other)
- Matrix multiplication using binary `@` operator in Python>=3.5.
- __setitem__(self, key, value)
- add(self, other, axis='columns', level=None, fill_value=None)
- Get Addition of dataframe and other, element-wise (binary operator `add`).
Equivalent to ``dataframe + other``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `radd`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- agg = aggregate(self, func=None, axis: 'Axis' = 0, *args, **kwargs)
- aggregate(self, func=None, axis: 'Axis' = 0, *args, **kwargs)
- Aggregate using one or more operations over the specified axis.
Parameters
----------
func : function, str, list or dict
Function to use for aggregating the data. If a function, must either
work when passed a DataFrame or when passed to DataFrame.apply.
Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
- dict of axis labels -> functions, function names or list of such.
axis : {0 or 'index', 1 or 'columns'}, default 0
If 0 or 'index': apply function to each column.
If 1 or 'columns': apply function to each row.
*args
Positional arguments to pass to `func`.
**kwargs
Keyword arguments to pass to `func`.
Returns
-------
scalar, Series or DataFrame
The return can be:
* scalar : when Series.agg is called with single function
* Series : when DataFrame.agg is called with a single function
* DataFrame : when DataFrame.agg is called with several functions
Return scalar, Series or DataFrame.
The aggregation operations are always performed over an axis, either the
index (default) or the column axis. This behavior is different from
`numpy` aggregation functions (`mean`, `median`, `prod`, `sum`, `std`,
`var`), where the default is to compute the aggregation of the flattened
array, e.g., ``numpy.mean(arr_2d)`` as opposed to
``numpy.mean(arr_2d, axis=0)``.
`agg` is an alias for `aggregate`. Use the alias.
See Also
--------
DataFrame.apply : Perform any type of operations.
DataFrame.transform : Perform transformation type operations.
core.groupby.GroupBy : Perform operations over groups.
core.resample.Resampler : Perform operations over resampled bins.
core.window.Rolling : Perform operations over rolling window.
core.window.Expanding : Perform operations over expanding window.
core.window.ExponentialMovingWindow : Perform operation over exponential weighted
window.
Notes
-----
`agg` is an alias for `aggregate`. Use the alias.
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
A passed user-defined-function will be passed a Series for evaluation.
Examples
--------
>>> df = pd.DataFrame([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... [np.nan, np.nan, np.nan]],
... columns=['A', 'B', 'C'])
Aggregate these functions over the rows.
>>> df.agg(['sum', 'min'])
A B C
sum 12.0 15.0 18.0
min 1.0 2.0 3.0
Different aggregations per column.
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
A B
sum 12.0 NaN
min 1.0 2.0
max NaN 8.0
Aggregate different functions over the columns and rename the index of the resulting
DataFrame.
>>> df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))
A B C
x 7.0 NaN NaN
y NaN 2.0 NaN
z NaN NaN 6.0
Aggregate over the columns.
>>> df.agg("mean", axis="columns")
0 2.0
1 5.0
2 8.0
3 NaN
dtype: float64
- align(self, other, join: 'str' = 'outer', axis: 'Axis | None' = None, level: 'Level | None' = None, copy: 'bool' = True, fill_value=None, method: 'str | None' = None, limit=None, fill_axis: 'Axis' = 0, broadcast_axis: 'Axis | None' = None) -> 'DataFrame'
- Align two objects on their axes with the specified join method.
Join method is specified for each axis Index.
Parameters
----------
other : DataFrame or Series
join : {'outer', 'inner', 'left', 'right'}, default 'outer'
axis : allowed axis of the other object, default None
Align on index (0), columns (1), or both (None).
level : int or level name, default None
Broadcast across a level, matching Index values on the
passed MultiIndex level.
copy : bool, default True
Always returns new objects. If copy=False and no reindexing is
required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any
"compatible" value.
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series:
- pad / ffill: propagate last valid observation forward to next valid.
- backfill / bfill: use NEXT valid observation to fill gap.
limit : int, default None
If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
fill_axis : {0 or 'index', 1 or 'columns'}, default 0
Filling axis, method and limit.
broadcast_axis : {0 or 'index', 1 or 'columns'}, default None
Broadcast values along this axis, if aligning two objects of
different dimensions.
Returns
-------
(left, right) : (DataFrame, type of other)
Aligned objects.
Examples
--------
>>> df = pd.DataFrame(
... [[1, 2, 3, 4], [6, 7, 8, 9]], columns=["D", "B", "E", "A"], index=[1, 2]
... )
>>> other = pd.DataFrame(
... [[10, 20, 30, 40], [60, 70, 80, 90], [600, 700, 800, 900]],
... columns=["A", "B", "C", "D"],
... index=[2, 3, 4],
... )
>>> df
D B E A
1 1 2 3 4
2 6 7 8 9
>>> other
A B C D
2 10 20 30 40
3 60 70 80 90
4 600 700 800 900
Align on columns:
>>> left, right = df.align(other, join="outer", axis=1)
>>> left
A B C D E
1 4 2 NaN 1 3
2 9 7 NaN 6 8
>>> right
A B C D E
2 10 20 30 40 NaN
3 60 70 80 90 NaN
4 600 700 800 900 NaN
We can also align on the index:
>>> left, right = df.align(other, join="outer", axis=0)
>>> left
D B E A
1 1.0 2.0 3.0 4.0
2 6.0 7.0 8.0 9.0
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
>>> right
A B C D
1 NaN NaN NaN NaN
2 10.0 20.0 30.0 40.0
3 60.0 70.0 80.0 90.0
4 600.0 700.0 800.0 900.0
Finally, the default `axis=None` will align on both index and columns:
>>> left, right = df.align(other, join="outer", axis=None)
>>> left
A B C D E
1 4.0 2.0 NaN 1.0 3.0
2 9.0 7.0 NaN 6.0 8.0
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
>>> right
A B C D E
1 NaN NaN NaN NaN NaN
2 10.0 20.0 30.0 40.0 NaN
3 60.0 70.0 80.0 90.0 NaN
4 600.0 700.0 800.0 900.0 NaN
- all(self, axis=0, bool_only=None, skipna=True, level=None, **kwargs)
- Return whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a series or
along a Dataframe axis that is False or equivalent (e.g. zero or
empty).
Parameters
----------
axis : {0 or 'index', 1 or 'columns', None}, default 0
Indicate which axis or axes should be reduced.
* 0 / 'index' : reduce the index, return a Series whose index is the
original column labels.
* 1 / 'columns' : reduce the columns, return a Series whose index is the
original index.
* None : reduce all axes, return a scalar.
bool_only : bool, default None
Include only boolean columns. If None, will attempt to use everything,
then use only boolean data. Not implemented for Series.
skipna : bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is
True, then the result will be True, as for an empty row/column.
If skipna is False, then NA are treated as True, because these are not
equal to zero.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
**kwargs : any, default None
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series
is returned.
See Also
--------
Series.all : Return True if all elements are True.
DataFrame.any : Return True if one (or more) elements are True.
Examples
--------
**Series**
>>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([], dtype="float64").all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True
**DataFrames**
Create a dataframe from a dictionary.
>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
col1 col2
0 True True
1 True False
Default behaviour checks if column-wise values all return True.
>>> df.all()
col1 True
col2 False
dtype: bool
Specify ``axis='columns'`` to check if row-wise values all return True.
>>> df.all(axis='columns')
0 True
1 False
dtype: bool
Or ``axis=None`` for whether every value is True.
>>> df.all(axis=None)
False
- any(self, axis=0, bool_only=None, skipna=True, level=None, **kwargs)
- Return whether any element is True, potentially over an axis.
Returns False unless there is at least one element within a series or
along a Dataframe axis that is True or equivalent (e.g. non-zero or
non-empty).
Parameters
----------
axis : {0 or 'index', 1 or 'columns', None}, default 0
Indicate which axis or axes should be reduced.
* 0 / 'index' : reduce the index, return a Series whose index is the
original column labels.
* 1 / 'columns' : reduce the columns, return a Series whose index is the
original index.
* None : reduce all axes, return a scalar.
bool_only : bool, default None
Include only boolean columns. If None, will attempt to use everything,
then use only boolean data. Not implemented for Series.
skipna : bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is
True, then the result will be False, as for an empty row/column.
If skipna is False, then NA are treated as True, because these are not
equal to zero.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
**kwargs : any, default None
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series
is returned.
See Also
--------
numpy.any : Numpy version of this method.
Series.any : Return whether any element is True.
Series.all : Return whether all elements are True.
DataFrame.any : Return whether any element is True over requested axis.
DataFrame.all : Return whether all elements are True over requested axis.
Examples
--------
**Series**
For Series input, the output is a scalar indicating whether any element
is True.
>>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([], dtype="float64").any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True
**DataFrame**
Whether each column contains at least one True element (the default).
>>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
A B C
0 1 0 0
1 2 2 0
>>> df.any()
A True
B True
C False
dtype: bool
Aggregating over the columns.
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
A B
0 True 1
1 False 2
>>> df.any(axis='columns')
0 True
1 True
dtype: bool
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
A B
0 True 1
1 False 0
>>> df.any(axis='columns')
0 True
1 False
dtype: bool
Aggregating over the entire DataFrame with ``axis=None``.
>>> df.any(axis=None)
True
`any` for an empty DataFrame is an empty Series.
>>> pd.DataFrame([]).any()
Series([], dtype: bool)
- append(self, other, ignore_index: 'bool' = False, verify_integrity: 'bool' = False, sort: 'bool' = False) -> 'DataFrame'
- Append rows of `other` to the end of caller, returning a new object.
.. deprecated:: 1.4.0
Use :func:`concat` instead. For further details see
:ref:`whatsnew_140.deprecations.frame_series_append`
Columns in `other` that are not in the caller are added as new columns.
Parameters
----------
other : DataFrame or Series/dict-like object, or list of these
The data to append.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
verify_integrity : bool, default False
If True, raise ValueError on creating index with duplicates.
sort : bool, default False
Sort columns if the columns of `self` and `other` are not aligned.
.. versionchanged:: 1.0.0
Changed to not sort by default.
Returns
-------
DataFrame
A new DataFrame consisting of the rows of caller and the rows of `other`.
See Also
--------
concat : General function to concatenate DataFrame or Series objects.
Notes
-----
If a list of dict/series is passed and the keys are all contained in
the DataFrame's index, the order of the columns in the resulting
DataFrame will be unchanged.
Iteratively appending rows to a DataFrame can be more computationally
intensive than a single concatenate. A better solution is to append
those rows to a list and then concatenate the list with the original
DataFrame all at once.
Examples
--------
>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'), index=['x', 'y'])
>>> df
A B
x 1 2
y 3 4
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'), index=['x', 'y'])
>>> df.append(df2)
A B
x 1 2
y 3 4
x 5 6
y 7 8
With `ignore_index` set to True:
>>> df.append(df2, ignore_index=True)
A B
0 1 2
1 3 4
2 5 6
3 7 8
The following, while not recommended methods for generating DataFrames,
show two ways to generate a DataFrame from multiple data sources.
Less efficient:
>>> df = pd.DataFrame(columns=['A'])
>>> for i in range(5):
... df = df.append({'A': i}, ignore_index=True)
>>> df
A
0 0
1 1
2 2
3 3
4 4
More efficient:
>>> pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)],
... ignore_index=True)
A
0 0
1 1
2 2
3 3
4 4
- apply(self, func: 'AggFuncType', axis: 'Axis' = 0, raw: 'bool' = False, result_type=None, args=(), **kwargs)
- Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is
either the DataFrame's index (``axis=0``) or the DataFrame's columns
(``axis=1``). By default (``result_type=None``), the final return type
is inferred from the return type of the applied function. Otherwise,
it depends on the `result_type` argument.
Parameters
----------
func : function
Function to apply to each column or row.
axis : {0 or 'index', 1 or 'columns'}, default 0
Axis along which the function is applied:
* 0 or 'index': apply function to each column.
* 1 or 'columns': apply function to each row.
raw : bool, default False
Determines if row or column is passed as a Series or ndarray object:
* ``False`` : passes each row or column as a Series to the
function.
* ``True`` : the passed function will receive ndarray objects
instead.
If you are just applying a NumPy reduction function this will
achieve much better performance.
result_type : {'expand', 'reduce', 'broadcast', None}, default None
These only act when ``axis=1`` (columns):
* 'expand' : list-like results will be turned into columns.
* 'reduce' : returns a Series if possible rather than expanding
list-like results. This is the opposite of 'expand'.
* 'broadcast' : results will be broadcast to the original shape
of the DataFrame, the original index and columns will be
retained.
The default behaviour (None) depends on the return value of the
applied function: list-like results will be returned as a Series
of those. However if the apply function returns a Series these
are expanded to columns.
args : tuple
Positional arguments to pass to `func` in addition to the
array/series.
**kwargs
Additional keyword arguments to pass as keywords arguments to
`func`.
Returns
-------
Series or DataFrame
Result of applying ``func`` along the given axis of the
DataFrame.
See Also
--------
DataFrame.applymap: For elementwise operations.
DataFrame.aggregate: Only perform aggregating type operations.
DataFrame.transform: Only perform transforming type operations.
Notes
-----
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
Examples
--------
>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df
A B
0 4 9
1 4 9
2 4 9
Using a numpy universal function (in this case the same as
``np.sqrt(df)``):
>>> df.apply(np.sqrt)
A B
0 2.0 3.0
1 2.0 3.0
2 2.0 3.0
Using a reducing function on either axis
>>> df.apply(np.sum, axis=0)
A 12
B 27
dtype: int64
>>> df.apply(np.sum, axis=1)
0 13
1 13
2 13
dtype: int64
Returning a list-like will result in a Series
>>> df.apply(lambda x: [1, 2], axis=1)
0 [1, 2]
1 [1, 2]
2 [1, 2]
dtype: object
Passing ``result_type='expand'`` will expand list-like results
to columns of a Dataframe
>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
0 1
0 1 2
1 1 2
2 1 2
Returning a Series inside the function is similar to passing
``result_type='expand'``. The resulting column names
will be the Series index.
>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
foo bar
0 1 2
1 1 2
2 1 2
Passing ``result_type='broadcast'`` will ensure the same shape
result, whether list-like or scalar is returned by the function,
and broadcast it along the axis. The resulting column names will
be the originals.
>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
A B
0 1 2
1 1 2
2 1 2
- applymap(self, func: 'PythonFuncType', na_action: 'str | None' = None, **kwargs) -> 'DataFrame'
- Apply a function to a Dataframe elementwise.
This method applies a function that accepts and returns a scalar
to every element of a DataFrame.
Parameters
----------
func : callable
Python function, returns a single value from a single value.
na_action : {None, 'ignore'}, default None
If ‘ignore’, propagate NaN values, without passing them to func.
.. versionadded:: 1.2
**kwargs
Additional keyword arguments to pass as keywords arguments to
`func`.
.. versionadded:: 1.3.0
Returns
-------
DataFrame
Transformed DataFrame.
See Also
--------
DataFrame.apply : Apply a function along input axis of DataFrame.
Examples
--------
>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
0 1
0 1.000 2.120
1 3.356 4.567
>>> df.applymap(lambda x: len(str(x)))
0 1
0 3 4
1 5 5
Like Series.map, NA values can be ignored:
>>> df_copy = df.copy()
>>> df_copy.iloc[0, 0] = pd.NA
>>> df_copy.applymap(lambda x: len(str(x)), na_action='ignore')
0 1
0 <NA> 4
1 5 5
Note that a vectorized version of `func` often exists, which will
be much faster. You could square each number elementwise.
>>> df.applymap(lambda x: x**2)
0 1
0 1.000000 4.494400
1 11.262736 20.857489
But it's better to avoid applymap in that case.
>>> df ** 2
0 1
0 1.000000 4.494400
1 11.262736 20.857489
- asfreq(self, freq: 'Frequency', method=None, how: 'str | None' = None, normalize: 'bool' = False, fill_value=None) -> 'DataFrame'
- Convert time series to specified frequency.
Returns the original data conformed to a new index with the specified
frequency.
If the index of this DataFrame is a :class:`~pandas.PeriodIndex`, the new index
is the result of transforming the original index with
:meth:`PeriodIndex.asfreq <pandas.PeriodIndex.asfreq>` (so the original index
will map one-to-one to the new index).
Otherwise, the new index will be equivalent to ``pd.date_range(start, end,
freq=freq)`` where ``start`` and ``end`` are, respectively, the first and
last entries in the original index (see :func:`pandas.date_range`). The
values corresponding to any timesteps in the new index which were not present
in the original index will be null (``NaN``), unless a method for filling
such unknowns is provided (see the ``method`` parameter below).
The :meth:`resample` method is more appropriate if an operation on each group of
timesteps (such as an aggregate) is necessary to represent the data at the new
frequency.
Parameters
----------
freq : DateOffset or str
Frequency DateOffset or string.
method : {'backfill'/'bfill', 'pad'/'ffill'}, default None
Method to use for filling holes in reindexed Series (note this
does not fill NaNs that already were present):
* 'pad' / 'ffill': propagate last valid observation forward to next
valid
* 'backfill' / 'bfill': use NEXT valid observation to fill.
how : {'start', 'end'}, default end
For PeriodIndex only (see PeriodIndex.asfreq).
normalize : bool, default False
Whether to reset output index to midnight.
fill_value : scalar, optional
Value to use for missing values, applied during upsampling (note
this does not fill NaNs that already were present).
Returns
-------
DataFrame
DataFrame object reindexed to the specified frequency.
See Also
--------
reindex : Conform DataFrame to new index with optional filling logic.
Notes
-----
To learn more about the frequency strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
Examples
--------
Start by creating a series with 4 one minute timestamps.
>>> index = pd.date_range('1/1/2000', periods=4, freq='T')
>>> series = pd.Series([0.0, None, 2.0, 3.0], index=index)
>>> df = pd.DataFrame({'s': series})
>>> df
s
2000-01-01 00:00:00 0.0
2000-01-01 00:01:00 NaN
2000-01-01 00:02:00 2.0
2000-01-01 00:03:00 3.0
Upsample the series into 30 second bins.
>>> df.asfreq(freq='30S')
s
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 NaN
2000-01-01 00:01:00 NaN
2000-01-01 00:01:30 NaN
2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 NaN
2000-01-01 00:03:00 3.0
Upsample again, providing a ``fill value``.
>>> df.asfreq(freq='30S', fill_value=9.0)
s
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 9.0
2000-01-01 00:01:00 NaN
2000-01-01 00:01:30 9.0
2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 9.0
2000-01-01 00:03:00 3.0
Upsample again, providing a ``method``.
>>> df.asfreq(freq='30S', method='bfill')
s
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 NaN
2000-01-01 00:01:00 NaN
2000-01-01 00:01:30 2.0
2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 3.0
2000-01-01 00:03:00 3.0
- assign(self, **kwargs) -> 'DataFrame'
- Assign new columns to a DataFrame.
Returns a new object with all original columns in addition to new ones.
Existing columns that are re-assigned will be overwritten.
Parameters
----------
**kwargs : dict of {str: callable or Series}
The column names are keywords. If the values are
callable, they are computed on the DataFrame and
assigned to the new columns. The callable must not
change input DataFrame (though pandas doesn't check it).
If the values are not callable, (e.g. a Series, scalar, or array),
they are simply assigned.
Returns
-------
DataFrame
A new DataFrame with the new columns in addition to
all the existing columns.
Notes
-----
Assigning multiple columns within the same ``assign`` is possible.
Later items in '\*\*kwargs' may refer to newly created or modified
columns in 'df'; items are computed and assigned into 'df' in order.
Examples
--------
>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
... index=['Portland', 'Berkeley'])
>>> df
temp_c
Portland 17.0
Berkeley 25.0
Where the value is a callable, evaluated on `df`:
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
Alternatively, the same behavior can be achieved by directly
referencing an existing Series or sequence:
>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
You can create multiple columns within the same assign where one
of the columns depends on another one defined within the same assign:
>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
... temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9)
temp_c temp_f temp_k
Portland 17.0 62.6 290.15
Berkeley 25.0 77.0 298.15
- bfill(self: 'DataFrame', axis: 'None | Axis' = None, inplace: 'bool' = False, limit: 'None | int' = None, downcast=None) -> 'DataFrame | None'
- Synonym for :meth:`DataFrame.fillna` with ``method='bfill'``.
Returns
-------
Series/DataFrame or None
Object with missing values filled or None if ``inplace=True``.
- boxplot = boxplot_frame(self, column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs)
- Make a box plot from DataFrame columns.
Make a box-and-whisker plot from DataFrame columns, optionally grouped
by some other columns. A box plot is a method for graphically depicting
groups of numerical data through their quartiles.
The box extends from the Q1 to Q3 quartile values of the data,
with a line at the median (Q2). The whiskers extend from the edges
of box to show the range of the data. By default, they extend no more than
`1.5 * IQR (IQR = Q3 - Q1)` from the edges of the box, ending at the farthest
data point within that interval. Outliers are plotted as separate dots.
For further details see
Wikipedia's entry for `boxplot <https://en.wikipedia.org/wiki/Box_plot>`_.
Parameters
----------
column : str or list of str, optional
Column name or list of names, or vector.
Can be any valid input to :meth:`pandas.DataFrame.groupby`.
by : str or array-like, optional
Column in the DataFrame to :meth:`pandas.DataFrame.groupby`.
One box-plot will be done per value of columns in `by`.
ax : object of class matplotlib.axes.Axes, optional
The matplotlib axes to be used by boxplot.
fontsize : float or str
Tick label font size in points or as a string (e.g., `large`).
rot : int or float, default 0
The rotation angle of labels (in degrees)
with respect to the screen coordinate system.
grid : bool, default True
Setting this to True will show the grid.
figsize : A tuple (width, height) in inches
The size of the figure to create in matplotlib.
layout : tuple (rows, columns), optional
For example, (3, 5) will display the subplots
using 3 columns and 5 rows, starting from the top-left.
return_type : {'axes', 'dict', 'both'} or None, default 'axes'
The kind of object to return. The default is ``axes``.
* 'axes' returns the matplotlib axes the boxplot is drawn on.
* 'dict' returns a dictionary whose values are the matplotlib
Lines of the boxplot.
* 'both' returns a namedtuple with the axes and dict.
* when grouping with ``by``, a Series mapping columns to
``return_type`` is returned.
If ``return_type`` is `None`, a NumPy array
of axes with the same shape as ``layout`` is returned.
backend : str, default None
Backend to use instead of the backend specified in the option
``plotting.backend``. For instance, 'matplotlib'. Alternatively, to
specify the ``plotting.backend`` for the whole session, set
``pd.options.plotting.backend``.
.. versionadded:: 1.0.0
**kwargs
All other plotting keyword arguments to be passed to
:func:`matplotlib.pyplot.boxplot`.
Returns
-------
result
See Notes.
See Also
--------
Series.plot.hist: Make a histogram.
matplotlib.pyplot.boxplot : Matplotlib equivalent plot.
Notes
-----
The return type depends on the `return_type` parameter:
* 'axes' : object of class matplotlib.axes.Axes
* 'dict' : dict of matplotlib.lines.Line2D objects
* 'both' : a namedtuple with structure (ax, lines)
For data grouped with ``by``, return a Series of the above or a numpy
array:
* :class:`~pandas.Series`
* :class:`~numpy.array` (for ``return_type = None``)
Use ``return_type='dict'`` when you want to tweak the appearance
of the lines after plotting. In this case a dict containing the Lines
making up the boxes, caps, fliers, medians, and whiskers is returned.
Examples
--------
Boxplots can be created for every column in the dataframe
by ``df.boxplot()`` or indicating the columns to be used:
.. plot::
:context: close-figs
>>> np.random.seed(1234)
>>> df = pd.DataFrame(np.random.randn(10, 4),
... columns=['Col1', 'Col2', 'Col3', 'Col4'])
>>> boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3']) # doctest: +SKIP
Boxplots of variables distributions grouped by the values of a third
variable can be created using the option ``by``. For instance:
.. plot::
:context: close-figs
>>> df = pd.DataFrame(np.random.randn(10, 2),
... columns=['Col1', 'Col2'])
>>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A',
... 'B', 'B', 'B', 'B', 'B'])
>>> boxplot = df.boxplot(by='X')
A list of strings (i.e. ``['X', 'Y']``) can be passed to boxplot
in order to group the data by combination of the variables in the x-axis:
.. plot::
:context: close-figs
>>> df = pd.DataFrame(np.random.randn(10, 3),
... columns=['Col1', 'Col2', 'Col3'])
>>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A',
... 'B', 'B', 'B', 'B', 'B'])
>>> df['Y'] = pd.Series(['A', 'B', 'A', 'B', 'A',
... 'B', 'A', 'B', 'A', 'B'])
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y'])
The layout of boxplot can be adjusted giving a tuple to ``layout``:
.. plot::
:context: close-figs
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
... layout=(2, 1))
Additional formatting can be done to the boxplot, like suppressing the grid
(``grid=False``), rotating the labels in the x-axis (i.e. ``rot=45``)
or changing the fontsize (i.e. ``fontsize=15``):
.. plot::
:context: close-figs
>>> boxplot = df.boxplot(grid=False, rot=45, fontsize=15) # doctest: +SKIP
The parameter ``return_type`` can be used to select the type of element
returned by `boxplot`. When ``return_type='axes'`` is selected,
the matplotlib axes on which the boxplot is drawn are returned:
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], return_type='axes')
>>> type(boxplot)
<class 'matplotlib.axes._subplots.AxesSubplot'>
When grouping with ``by``, a Series mapping columns to ``return_type``
is returned:
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
... return_type='axes')
>>> type(boxplot)
<class 'pandas.core.series.Series'>
If ``return_type`` is `None`, a NumPy array of axes with the same shape
as ``layout`` is returned:
>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
... return_type=None)
>>> type(boxplot)
<class 'numpy.ndarray'>
- clip(self: 'DataFrame', lower=None, upper=None, axis: 'Axis | None' = None, inplace: 'bool' = False, *args, **kwargs) -> 'DataFrame | None'
- Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds
can be singular values or array like, and in the latter case
the clipping is performed element-wise in the specified axis.
Parameters
----------
lower : float or array-like, default None
Minimum threshold value. All values below this
threshold will be set to it. A missing
threshold (e.g `NA`) will not clip the value.
upper : float or array-like, default None
Maximum threshold value. All values above this
threshold will be set to it. A missing
threshold (e.g `NA`) will not clip the value.
axis : int or str axis name, optional
Align object with lower and upper along the given axis.
inplace : bool, default False
Whether to perform the operation in place on the data.
*args, **kwargs
Additional keywords have no effect but might be accepted
for compatibility with numpy.
Returns
-------
Series or DataFrame or None
Same type as calling object with the values outside the
clip boundaries replaced or None if ``inplace=True``.
See Also
--------
Series.clip : Trim values at input threshold in series.
DataFrame.clip : Trim values at input threshold in dataframe.
numpy.clip : Clip (limit) the values in an array.
Examples
--------
>>> data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
>>> df = pd.DataFrame(data)
>>> df
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5
Clips per column using lower and upper thresholds:
>>> df.clip(-4, 6)
col_0 col_1
0 6 -2
1 -3 -4
2 0 6
3 -1 6
4 5 -4
Clips using specific lower and upper thresholds per column element:
>>> t = pd.Series([2, -4, -1, 6, 3])
>>> t
0 2
1 -4
2 -1
3 6
4 3
dtype: int64
>>> df.clip(t, t + 4, axis=0)
col_0 col_1
0 6 2
1 -3 -4
2 0 3
3 6 8
4 5 3
Clips using specific lower threshold per column element, with missing values:
>>> t = pd.Series([2, -4, np.NaN, 6, 3])
>>> t
0 2.0
1 -4.0
2 NaN
3 6.0
4 3.0
dtype: float64
>>> df.clip(t, axis=0)
col_0 col_1
0 9 2
1 -3 -4
2 0 6
3 6 8
4 5 3
- combine(self, other: 'DataFrame', func, fill_value=None, overwrite: 'bool' = True) -> 'DataFrame'
- Perform column-wise combine with another DataFrame.
Combines a DataFrame with `other` DataFrame using `func`
to element-wise combine columns. The row and column indexes of the
resulting DataFrame will be the union of the two.
Parameters
----------
other : DataFrame
The DataFrame to merge column-wise.
func : function
Function that takes two series as inputs and return a Series or a
scalar. Used to merge the two dataframes column by columns.
fill_value : scalar value, default None
The value to fill NaNs with prior to passing any column to the
merge func.
overwrite : bool, default True
If True, columns in `self` that do not exist in `other` will be
overwritten with NaNs.
Returns
-------
DataFrame
Combination of the provided DataFrames.
See Also
--------
DataFrame.combine_first : Combine two DataFrame objects and default to
non-null values in frame calling the method.
Examples
--------
Combine using a simple function that chooses the smaller column.
>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
A B
0 0 3
1 0 3
Example using a true element-wise combine function.
>>> df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
A B
0 1 2
1 0 3
Using `fill_value` fills Nones prior to passing the column to the
merge function.
>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
A B
0 0 -5.0
1 0 4.0
However, if the same element in both dataframes is None, that None
is preserved
>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
A B
0 0 -5.0
1 0 3.0
Example that demonstrates the use of `overwrite` and behavior when
the axis differ between the dataframes.
>>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])
>>> df1.combine(df2, take_smaller)
A B C
0 NaN NaN NaN
1 NaN 3.0 -10.0
2 NaN 3.0 1.0
>>> df1.combine(df2, take_smaller, overwrite=False)
A B C
0 0.0 NaN NaN
1 0.0 3.0 -10.0
2 NaN 3.0 1.0
Demonstrating the preference of the passed in dataframe.
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2])
>>> df2.combine(df1, take_smaller)
A B C
0 0.0 NaN NaN
1 0.0 3.0 NaN
2 NaN 3.0 NaN
>>> df2.combine(df1, take_smaller, overwrite=False)
A B C
0 0.0 NaN NaN
1 0.0 3.0 1.0
2 NaN 3.0 1.0
- combine_first(self, other: 'DataFrame') -> 'DataFrame'
- Update null elements with value in the same location in `other`.
Combine two DataFrame objects by filling null values in one DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two.
Parameters
----------
other : DataFrame
Provided DataFrame to use to fill null values.
Returns
-------
DataFrame
The result of combining the provided DataFrame with the other object.
See Also
--------
DataFrame.combine : Perform series-wise operation on two DataFrames
using a given function.
Examples
--------
>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
A B
0 1.0 3.0
1 0.0 4.0
Null values still persist if the location of that null value
does not exist in `other`
>>> df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df1.combine_first(df2)
A B C
0 NaN 4.0 NaN
1 0.0 3.0 1.0
2 NaN 3.0 1.0
- compare(self, other: 'DataFrame', align_axis: 'Axis' = 1, keep_shape: 'bool' = False, keep_equal: 'bool' = False) -> 'DataFrame'
- Compare to another DataFrame and show the differences.
.. versionadded:: 1.1.0
Parameters
----------
other : DataFrame
Object to compare with.
align_axis : {0 or 'index', 1 or 'columns'}, default 1
Determine which axis to align the comparison on.
* 0, or 'index' : Resulting differences are stacked vertically
with rows drawn alternately from self and other.
* 1, or 'columns' : Resulting differences are aligned horizontally
with columns drawn alternately from self and other.
keep_shape : bool, default False
If true, all rows and columns are kept.
Otherwise, only the ones with different values are kept.
keep_equal : bool, default False
If true, the result keeps values that are equal.
Otherwise, equal values are shown as NaNs.
Returns
-------
DataFrame
DataFrame that shows the differences stacked side by side.
The resulting index will be a MultiIndex with 'self' and 'other'
stacked alternately at the inner level.
Raises
------
ValueError
When the two DataFrames don't have identical labels or shape.
See Also
--------
Series.compare : Compare with another Series and show differences.
DataFrame.equals : Test whether two objects contain the same elements.
Notes
-----
Matching NaNs will not appear as a difference.
Can only compare identically-labeled
(i.e. same shape, identical row and column labels) DataFrames
Examples
--------
>>> df = pd.DataFrame(
... {
... "col1": ["a", "a", "b", "b", "a"],
... "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
... },
... columns=["col1", "col2", "col3"],
... )
>>> df
col1 col2 col3
0 a 1.0 1.0
1 a 2.0 2.0
2 b 3.0 3.0
3 b NaN 4.0
4 a 5.0 5.0
>>> df2 = df.copy()
>>> df2.loc[0, 'col1'] = 'c'
>>> df2.loc[2, 'col3'] = 4.0
>>> df2
col1 col2 col3
0 c 1.0 1.0
1 a 2.0 2.0
2 b 3.0 4.0
3 b NaN 4.0
4 a 5.0 5.0
Align the differences on columns
>>> df.compare(df2)
col1 col3
self other self other
0 a c NaN NaN
2 NaN NaN 3.0 4.0
Stack the differences on rows
>>> df.compare(df2, align_axis=0)
col1 col3
0 self a NaN
other c NaN
2 self NaN 3.0
other NaN 4.0
Keep the equal values
>>> df.compare(df2, keep_equal=True)
col1 col3
self other self other
0 a c 1.0 1.0
2 b b 3.0 4.0
Keep all original rows and columns
>>> df.compare(df2, keep_shape=True)
col1 col2 col3
self other self other self other
0 a c NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN 3.0 4.0
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN
Keep all original rows and columns and also all original values
>>> df.compare(df2, keep_shape=True, keep_equal=True)
col1 col2 col3
self other self other self other
0 a c 1.0 1.0 1.0 1.0
1 a a 2.0 2.0 2.0 2.0
2 b b 3.0 3.0 3.0 4.0
3 b b NaN NaN 4.0 4.0
4 a a 5.0 5.0 5.0 5.0
- corr(self, method: 'str | Callable[[np.ndarray, np.ndarray], float]' = 'pearson', min_periods: 'int' = 1) -> 'DataFrame'
- Compute pairwise correlation of columns, excluding NA/null values.
Parameters
----------
method : {'pearson', 'kendall', 'spearman'} or callable
Method of correlation:
* pearson : standard correlation coefficient
* kendall : Kendall Tau correlation coefficient
* spearman : Spearman rank correlation
* callable: callable with input two 1d ndarrays
and returning a float. Note that the returned matrix from corr
will have 1 along the diagonals and will be symmetric
regardless of the callable's behavior.
min_periods : int, optional
Minimum number of observations required per pair of columns
to have a valid result. Currently only available for Pearson
and Spearman correlation.
Returns
-------
DataFrame
Correlation matrix.
See Also
--------
DataFrame.corrwith : Compute pairwise correlation with another
DataFrame or Series.
Series.corr : Compute the correlation between two Series.
Examples
--------
>>> def histogram_intersection(a, b):
... v = np.minimum(a, b).sum().round(decimals=1)
... return v
>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
... columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)
dogs cats
dogs 1.0 0.3
cats 0.3 1.0
- corrwith(self, other, axis: 'Axis' = 0, drop=False, method='pearson') -> 'Series'
- Compute pairwise correlation.
Pairwise correlation is computed between rows or columns of
DataFrame with rows or columns of Series or DataFrame. DataFrames
are first aligned along both axes before computing the
correlations.
Parameters
----------
other : DataFrame, Series
Object with which to compute correlations.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to use. 0 or 'index' to compute column-wise, 1 or 'columns' for
row-wise.
drop : bool, default False
Drop missing indices from result.
method : {'pearson', 'kendall', 'spearman'} or callable
Method of correlation:
* pearson : standard correlation coefficient
* kendall : Kendall Tau correlation coefficient
* spearman : Spearman rank correlation
* callable: callable with input two 1d ndarrays
and returning a float.
Returns
-------
Series
Pairwise correlations.
See Also
--------
DataFrame.corr : Compute pairwise correlation of columns.
- count(self, axis: 'Axis' = 0, level: 'Level | None' = None, numeric_only: 'bool' = False)
- Count non-NA cells for each column or row.
The values `None`, `NaN`, `NaT`, and optionally `numpy.inf` (depending
on `pandas.options.mode.use_inf_as_na`) are considered NA.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
If 0 or 'index' counts are generated for each column.
If 1 or 'columns' counts are generated for each row.
level : int or str, optional
If the axis is a `MultiIndex` (hierarchical), count along a
particular `level`, collapsing into a `DataFrame`.
A `str` specifies the level name.
numeric_only : bool, default False
Include only `float`, `int` or `boolean` data.
Returns
-------
Series or DataFrame
For each column/row the number of non-NA/null entries.
If `level` is specified returns a `DataFrame`.
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.value_counts: Count unique combinations of columns.
DataFrame.shape: Number of DataFrame rows and columns (including NA
elements).
DataFrame.isna: Boolean same-sized DataFrame showing places of NA
elements.
Examples
--------
Constructing DataFrame from a dictionary:
>>> df = pd.DataFrame({"Person":
... ["John", "Myla", "Lewis", "John", "Myla"],
... "Age": [24., np.nan, 21., 33, 26],
... "Single": [False, True, True, True, False]})
>>> df
Person Age Single
0 John 24.0 False
1 Myla NaN True
2 Lewis 21.0 True
3 John 33.0 True
4 Myla 26.0 False
Notice the uncounted NA values:
>>> df.count()
Person 5
Age 4
Single 5
dtype: int64
Counts for each **row**:
>>> df.count(axis='columns')
0 3
1 2
2 3
3 3
4 3
dtype: int64
- cov(self, min_periods: 'int | None' = None, ddof: 'int | None' = 1) -> 'DataFrame'
- Compute pairwise covariance of columns, excluding NA/null values.
Compute the pairwise covariance among the series of a DataFrame.
The returned data frame is the `covariance matrix
<https://en.wikipedia.org/wiki/Covariance_matrix>`__ of the columns
of the DataFrame.
Both NA and null values are automatically excluded from the
calculation. (See the note below about bias from missing values.)
A threshold can be set for the minimum number of
observations for each value created. Comparisons with observations
below this threshold will be returned as ``NaN``.
This method is generally used for the analysis of time series data to
understand the relationship between different measures
across time.
Parameters
----------
min_periods : int, optional
Minimum number of observations required per pair of columns
to have a valid result.
ddof : int, default 1
Delta degrees of freedom. The divisor used in calculations
is ``N - ddof``, where ``N`` represents the number of elements.
.. versionadded:: 1.1.0
Returns
-------
DataFrame
The covariance matrix of the series of the DataFrame.
See Also
--------
Series.cov : Compute covariance with another Series.
core.window.ExponentialMovingWindow.cov: Exponential weighted sample covariance.
core.window.Expanding.cov : Expanding sample covariance.
core.window.Rolling.cov : Rolling sample covariance.
Notes
-----
Returns the covariance matrix of the DataFrame's time series.
The covariance is normalized by N-ddof.
For DataFrames that have Series that are missing data (assuming that
data is `missing at random
<https://en.wikipedia.org/wiki/Missing_data#Missing_at_random>`__)
the returned covariance matrix will be an unbiased estimate
of the variance and covariance between the member Series.
However, for many applications this estimate may not be acceptable
because the estimate covariance matrix is not guaranteed to be positive
semi-definite. This could lead to estimate correlations having
absolute values which are greater than one, and/or a non-invertible
covariance matrix. See `Estimation of covariance matrices
<https://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_
matrices>`__ for more details.
Examples
--------
>>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
... columns=['dogs', 'cats'])
>>> df.cov()
dogs cats
dogs 0.666667 -1.000000
cats -1.000000 1.666667
>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(1000, 5),
... columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov()
a b c d e
a 0.998438 -0.020161 0.059277 -0.008943 0.014144
b -0.020161 1.059352 -0.008543 -0.024738 0.009826
c 0.059277 -0.008543 1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486 0.921297 -0.013692
e 0.014144 0.009826 -0.000271 -0.013692 0.977795
**Minimum number of periods**
This method also supports an optional ``min_periods`` keyword
that specifies the required minimum number of non-NA observations for
each column pair in order to have a valid result:
>>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(20, 3),
... columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = np.nan
>>> df.loc[df.index[5:10], 'b'] = np.nan
>>> df.cov(min_periods=12)
a b c
a 0.316741 NaN -0.150812
b NaN 1.248003 0.191417
c -0.150812 0.191417 0.895202
- cummax(self, axis=None, skipna=True, *args, **kwargs)
- Return cumulative maximum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
maximum.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args, **kwargs
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
Series or DataFrame
Return cumulative maximum of Series or DataFrame.
See Also
--------
core.window.Expanding.max : Similar functionality
but ignores ``NaN`` values.
DataFrame.max : Return the maximum over
DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.
Examples
--------
**Series**
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 NaN
2 5.0
3 -1.0
4 0.0
dtype: float64
By default, NA values are ignored.
>>> s.cummax()
0 2.0
1 NaN
2 5.0
3 5.0
4 5.0
dtype: float64
To include NA values in the operation, use ``skipna=False``
>>> s.cummax(skipna=False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
**DataFrame**
>>> df = pd.DataFrame([[2.0, 1.0],
... [3.0, np.nan],
... [1.0, 0.0]],
... columns=list('AB'))
>>> df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
By default, iterates over rows and finds the maximum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
>>> df.cummax()
A B
0 2.0 1.0
1 3.0 NaN
2 3.0 1.0
To iterate over columns and find the maximum in each row,
use ``axis=1``
>>> df.cummax(axis=1)
A B
0 2.0 2.0
1 3.0 NaN
2 1.0 1.0
- cummin(self, axis=None, skipna=True, *args, **kwargs)
- Return cumulative minimum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
minimum.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args, **kwargs
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
Series or DataFrame
Return cumulative minimum of Series or DataFrame.
See Also
--------
core.window.Expanding.min : Similar functionality
but ignores ``NaN`` values.
DataFrame.min : Return the minimum over
DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.
Examples
--------
**Series**
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 NaN
2 5.0
3 -1.0
4 0.0
dtype: float64
By default, NA values are ignored.
>>> s.cummin()
0 2.0
1 NaN
2 2.0
3 -1.0
4 -1.0
dtype: float64
To include NA values in the operation, use ``skipna=False``
>>> s.cummin(skipna=False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
**DataFrame**
>>> df = pd.DataFrame([[2.0, 1.0],
... [3.0, np.nan],
... [1.0, 0.0]],
... columns=list('AB'))
>>> df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
By default, iterates over rows and finds the minimum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
>>> df.cummin()
A B
0 2.0 1.0
1 2.0 NaN
2 1.0 0.0
To iterate over columns and find the minimum in each row,
use ``axis=1``
>>> df.cummin(axis=1)
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
- cumprod(self, axis=None, skipna=True, *args, **kwargs)
- Return cumulative product over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
product.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args, **kwargs
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
Series or DataFrame
Return cumulative product of Series or DataFrame.
See Also
--------
core.window.Expanding.prod : Similar functionality
but ignores ``NaN`` values.
DataFrame.prod : Return the product over
DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.
Examples
--------
**Series**
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 NaN
2 5.0
3 -1.0
4 0.0
dtype: float64
By default, NA values are ignored.
>>> s.cumprod()
0 2.0
1 NaN
2 10.0
3 -10.0
4 -0.0
dtype: float64
To include NA values in the operation, use ``skipna=False``
>>> s.cumprod(skipna=False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
**DataFrame**
>>> df = pd.DataFrame([[2.0, 1.0],
... [3.0, np.nan],
... [1.0, 0.0]],
... columns=list('AB'))
>>> df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
By default, iterates over rows and finds the product
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
>>> df.cumprod()
A B
0 2.0 1.0
1 6.0 NaN
2 6.0 0.0
To iterate over columns and find the product in each row,
use ``axis=1``
>>> df.cumprod(axis=1)
A B
0 2.0 2.0
1 3.0 NaN
2 1.0 0.0
- cumsum(self, axis=None, skipna=True, *args, **kwargs)
- Return cumulative sum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
sum.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args, **kwargs
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
Series or DataFrame
Return cumulative sum of Series or DataFrame.
See Also
--------
core.window.Expanding.sum : Similar functionality
but ignores ``NaN`` values.
DataFrame.sum : Return the sum over
DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.
Examples
--------
**Series**
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 NaN
2 5.0
3 -1.0
4 0.0
dtype: float64
By default, NA values are ignored.
>>> s.cumsum()
0 2.0
1 NaN
2 7.0
3 6.0
4 6.0
dtype: float64
To include NA values in the operation, use ``skipna=False``
>>> s.cumsum(skipna=False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
**DataFrame**
>>> df = pd.DataFrame([[2.0, 1.0],
... [3.0, np.nan],
... [1.0, 0.0]],
... columns=list('AB'))
>>> df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
By default, iterates over rows and finds the sum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
>>> df.cumsum()
A B
0 2.0 1.0
1 5.0 NaN
2 6.0 1.0
To iterate over columns and find the sum in each row,
use ``axis=1``
>>> df.cumsum(axis=1)
A B
0 2.0 3.0
1 3.0 NaN
2 1.0 1.0
- diff(self, periods: 'int' = 1, axis: 'Axis' = 0) -> 'DataFrame'
- First discrete difference of element.
Calculates the difference of a Dataframe element compared with another
element in the Dataframe (default is element in previous row).
Parameters
----------
periods : int, default 1
Periods to shift for calculating difference, accepts negative
values.
axis : {0 or 'index', 1 or 'columns'}, default 0
Take difference over rows (0) or columns (1).
Returns
-------
Dataframe
First differences of the Series.
See Also
--------
Dataframe.pct_change: Percent change over given number of periods.
Dataframe.shift: Shift index by desired number of periods with an
optional time freq.
Series.diff: First discrete difference of object.
Notes
-----
For boolean dtypes, this uses :meth:`operator.xor` rather than
:meth:`operator.sub`.
The result is calculated according to current dtype in Dataframe,
however dtype of the result is always float64.
Examples
--------
Difference with previous row
>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
... 'b': [1, 1, 2, 3, 5, 8],
... 'c': [1, 4, 9, 16, 25, 36]})
>>> df
a b c
0 1 1 1
1 2 1 4
2 3 2 9
3 4 3 16
4 5 5 25
5 6 8 36
>>> df.diff()
a b c
0 NaN NaN NaN
1 1.0 0.0 3.0
2 1.0 1.0 5.0
3 1.0 1.0 7.0
4 1.0 2.0 9.0
5 1.0 3.0 11.0
Difference with previous column
>>> df.diff(axis=1)
a b c
0 NaN 0 0
1 NaN -1 3
2 NaN -1 7
3 NaN -1 13
4 NaN 0 20
5 NaN 2 28
Difference with 3rd previous row
>>> df.diff(periods=3)
a b c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 3.0 2.0 15.0
4 3.0 4.0 21.0
5 3.0 6.0 27.0
Difference with following row
>>> df.diff(periods=-1)
a b c
0 -1.0 0.0 -3.0
1 -1.0 -1.0 -5.0
2 -1.0 -1.0 -7.0
3 -1.0 -2.0 -9.0
4 -1.0 -3.0 -11.0
5 NaN NaN NaN
Overflow in input dtype
>>> df = pd.DataFrame({'a': [1, 0]}, dtype=np.uint8)
>>> df.diff()
a
0 NaN
1 255.0
- div = truediv(self, other, axis='columns', level=None, fill_value=None)
- divide = truediv(self, other, axis='columns', level=None, fill_value=None)
- dot(self, other: 'AnyArrayLike | DataFrame') -> 'DataFrame | Series'
- Compute the matrix multiplication between the DataFrame and other.
This method computes the matrix product between the DataFrame and the
values of an other Series, DataFrame or a numpy array.
It can also be called using ``self @ other`` in Python >= 3.5.
Parameters
----------
other : Series, DataFrame or array-like
The other object to compute the matrix product with.
Returns
-------
Series or DataFrame
If other is a Series, return the matrix product between self and
other as a Series. If other is a DataFrame or a numpy.array, return
the matrix product of self and other in a DataFrame of a np.array.
See Also
--------
Series.dot: Similar method for Series.
Notes
-----
The dimensions of DataFrame and other must be compatible in order to
compute the matrix multiplication. In addition, the column names of
DataFrame and the index of other must contain the same values, as they
will be aligned prior to the multiplication.
The dot method for Series computes the inner product, instead of the
matrix product here.
Examples
--------
Here we multiply a DataFrame with a Series.
>>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> s = pd.Series([1, 1, 2, 1])
>>> df.dot(s)
0 -4
1 5
dtype: int64
Here we multiply a DataFrame with another DataFrame.
>>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(other)
0 1
0 1 4
1 2 2
Note that the dot method give the same result as @
>>> df @ other
0 1
0 1 4
1 2 2
The dot method works also if other is an np.array.
>>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(arr)
0 1
0 1 4
1 2 2
Note how shuffling of the objects does not change the result.
>>> s2 = s.reindex([1, 0, 2, 3])
>>> df.dot(s2)
0 -4
1 5
dtype: int64
- drop(self, labels=None, axis: 'Axis' = 0, index=None, columns=None, level: 'Level | None' = None, inplace: 'bool' = False, errors: 'str' = 'raise')
- Drop specified labels from rows or columns.
Remove rows or columns by specifying label names and corresponding
axis, or by specifying directly index or column names. When using a
multi-index, labels on different levels can be removed by specifying
the level. See the `user guide <advanced.shown_levels>`
for more information about the now unused levels.
Parameters
----------
labels : single label or list-like
Index or column labels to drop. A tuple will be used as a single
label and not treated as a list-like.
axis : {0 or 'index', 1 or 'columns'}, default 0
Whether to drop labels from the index (0 or 'index') or
columns (1 or 'columns').
index : single label or list-like
Alternative to specifying axis (``labels, axis=0``
is equivalent to ``index=labels``).
columns : single label or list-like
Alternative to specifying axis (``labels, axis=1``
is equivalent to ``columns=labels``).
level : int or level name, optional
For MultiIndex, level from which the labels will be removed.
inplace : bool, default False
If False, return a copy. Otherwise, do operation
inplace and return None.
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and only existing labels are
dropped.
Returns
-------
DataFrame or None
DataFrame without the removed index or column labels or
None if ``inplace=True``.
Raises
------
KeyError
If any of the labels is not found in the selected axis.
See Also
--------
DataFrame.loc : Label-location based indexer for selection by label.
DataFrame.dropna : Return DataFrame with labels on given axis omitted
where (all or any) data are missing.
DataFrame.drop_duplicates : Return DataFrame with duplicate rows
removed, optionally only considering certain columns.
Series.drop : Return Series with specified index labels removed.
Examples
--------
>>> df = pd.DataFrame(np.arange(12).reshape(3, 4),
... columns=['A', 'B', 'C', 'D'])
>>> df
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
Drop columns
>>> df.drop(['B', 'C'], axis=1)
A D
0 0 3
1 4 7
2 8 11
>>> df.drop(columns=['B', 'C'])
A D
0 0 3
1 4 7
2 8 11
Drop a row by index
>>> df.drop([0, 1])
A B C D
2 8 9 10 11
Drop columns and/or rows of MultiIndex DataFrame
>>> midx = pd.MultiIndex(levels=[['lama', 'cow', 'falcon'],
... ['speed', 'weight', 'length']],
... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
... [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = pd.DataFrame(index=midx, columns=['big', 'small'],
... data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
... [250, 150], [1.5, 0.8], [320, 250],
... [1, 0.8], [0.3, 0.2]])
>>> df
big small
lama speed 45.0 30.0
weight 200.0 100.0
length 1.5 1.0
cow speed 30.0 20.0
weight 250.0 150.0
length 1.5 0.8
falcon speed 320.0 250.0
weight 1.0 0.8
length 0.3 0.2
Drop a specific index combination from the MultiIndex
DataFrame, i.e., drop the combination ``'falcon'`` and
``'weight'``, which deletes only the corresponding row
>>> df.drop(index=('falcon', 'weight'))
big small
lama speed 45.0 30.0
weight 200.0 100.0
length 1.5 1.0
cow speed 30.0 20.0
weight 250.0 150.0
length 1.5 0.8
falcon speed 320.0 250.0
length 0.3 0.2
>>> df.drop(index='cow', columns='small')
big
lama speed 45.0
weight 200.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
>>> df.drop(index='length', level=1)
big small
lama speed 45.0 30.0
weight 200.0 100.0
cow speed 30.0 20.0
weight 250.0 150.0
falcon speed 320.0 250.0
weight 1.0 0.8
- drop_duplicates(self, subset: 'Hashable | Sequence[Hashable] | None' = None, keep: "Literal['first'] | Literal['last'] | Literal[False]" = 'first', inplace: 'bool' = False, ignore_index: 'bool' = False) -> 'DataFrame | None'
- Return DataFrame with duplicate rows removed.
Considering certain columns is optional. Indexes, including time indexes
are ignored.
Parameters
----------
subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by
default use all of the columns.
keep : {'first', 'last', False}, default 'first'
Determines which duplicates (if any) to keep.
- ``first`` : Drop duplicates except for the first occurrence.
- ``last`` : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
inplace : bool, default False
Whether to drop duplicates in place or to return a copy.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.0.0
Returns
-------
DataFrame or None
DataFrame with duplicates removed or None if ``inplace=True``.
See Also
--------
DataFrame.value_counts: Count unique combinations of columns.
Examples
--------
Consider dataset containing ramen rating.
>>> df = pd.DataFrame({
... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
... 'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
By default, it removes duplicate rows based on all columns.
>>> df.drop_duplicates()
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
To remove duplicates on specific column(s), use ``subset``.
>>> df.drop_duplicates(subset=['brand'])
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
To remove duplicates and keep last occurrences, use ``keep``.
>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
4 Indomie pack 5.0
- dropna(self, axis: 'Axis' = 0, how: 'str' = 'any', thresh=None, subset: 'IndexLabel' = None, inplace: 'bool' = False)
- Remove missing values.
See the :ref:`User Guide <missing_data>` for more on which values are
considered missing, and how to work with missing data.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
Determine if rows or columns which contain missing values are
removed.
* 0, or 'index' : Drop rows which contain missing values.
* 1, or 'columns' : Drop columns which contain missing value.
.. versionchanged:: 1.0.0
Pass tuple or list to drop on multiple axes.
Only a single axis is allowed.
how : {'any', 'all'}, default 'any'
Determine if row or column is removed from DataFrame, when we have
at least one NA or all NA.
* 'any' : If any NA values are present, drop that row or column.
* 'all' : If all values are NA, drop that row or column.
thresh : int, optional
Require that many non-NA values.
subset : column label or sequence of labels, optional
Labels along other axis to consider, e.g. if you are dropping rows
these would be a list of columns to include.
inplace : bool, default False
If True, do operation inplace and return None.
Returns
-------
DataFrame or None
DataFrame with NA entries dropped from it or None if ``inplace=True``.
See Also
--------
DataFrame.isna: Indicate missing values.
DataFrame.notna : Indicate existing (non-missing) values.
DataFrame.fillna : Replace missing values.
Series.dropna : Drop missing values.
Index.dropna : Drop missing indices.
Examples
--------
>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
... "toy": [np.nan, 'Batmobile', 'Bullwhip'],
... "born": [pd.NaT, pd.Timestamp("1940-04-25"),
... pd.NaT]})
>>> df
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Drop the rows where at least one element is missing.
>>> df.dropna()
name toy born
1 Batman Batmobile 1940-04-25
Drop the columns where at least one element is missing.
>>> df.dropna(axis='columns')
name
0 Alfred
1 Batman
2 Catwoman
Drop the rows where all elements are missing.
>>> df.dropna(how='all')
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Keep only the rows with at least 2 non-NA values.
>>> df.dropna(thresh=2)
name toy born
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Define in which columns to look for missing values.
>>> df.dropna(subset=['name', 'toy'])
name toy born
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
Keep the DataFrame with valid entries in the same variable.
>>> df.dropna(inplace=True)
>>> df
name toy born
1 Batman Batmobile 1940-04-25
- duplicated(self, subset: 'Hashable | Sequence[Hashable] | None' = None, keep: "Literal['first'] | Literal['last'] | Literal[False]" = 'first') -> 'Series'
- Return boolean Series denoting duplicate rows.
Considering certain columns is optional.
Parameters
----------
subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by
default use all of the columns.
keep : {'first', 'last', False}, default 'first'
Determines which duplicates (if any) to mark.
- ``first`` : Mark duplicates as ``True`` except for the first occurrence.
- ``last`` : Mark duplicates as ``True`` except for the last occurrence.
- False : Mark all duplicates as ``True``.
Returns
-------
Series
Boolean series for each duplicated rows.
See Also
--------
Index.duplicated : Equivalent method on index.
Series.duplicated : Equivalent method on Series.
Series.drop_duplicates : Remove duplicate values from Series.
DataFrame.drop_duplicates : Remove duplicate values from DataFrame.
Examples
--------
Consider dataset containing ramen rating.
>>> df = pd.DataFrame({
... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
... 'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
By default, for each set of duplicated values, the first occurrence
is set on False and all others on True.
>>> df.duplicated()
0 False
1 True
2 False
3 False
4 False
dtype: bool
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True.
>>> df.duplicated(keep='last')
0 True
1 False
2 False
3 False
4 False
dtype: bool
By setting ``keep`` on False, all duplicates are True.
>>> df.duplicated(keep=False)
0 True
1 True
2 False
3 False
4 False
dtype: bool
To find duplicates on specific column(s), use ``subset``.
>>> df.duplicated(subset=['brand'])
0 False
1 True
2 False
3 True
4 True
dtype: bool
- eq(self, other, axis='columns', level=None)
- Get Equal to of dataframe and other, element-wise (binary operator `eq`).
Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison
operators.
Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis
(rows or columns) and level for comparison.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns').
level : int or label
Broadcast across a level, matching Index values on the passed
MultiIndex level.
Returns
-------
DataFrame of bool
Result of the comparison.
See Also
--------
DataFrame.eq : Compare DataFrames for equality elementwise.
DataFrame.ne : Compare DataFrames for inequality elementwise.
DataFrame.le : Compare DataFrames for less than inequality
or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.
Notes
-----
Mismatched indices will be unioned together.
`NaN` values are considered different (i.e. `NaN` != `NaN`).
Examples
--------
>>> df = pd.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df
cost revenue
A 250 100
B 150 250
C 100 300
Comparison with a scalar, using either the operator or method:
>>> df == 100
cost revenue
A False True
B False False
C True False
>>> df.eq(100)
cost revenue
A False True
B False False
C True False
When `other` is a :class:`Series`, the columns of a DataFrame are aligned
with the index of `other` and broadcast:
>>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost revenue
A True True
B True False
C False True
Use the method to control the broadcast axis:
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost revenue
A True False
B True True
C True True
D True True
When comparing to an arbitrary sequence, the number of columns must
match the number elements in `other`:
>>> df == [250, 100]
cost revenue
A True True
B False False
C False False
Use the method to control the axis:
>>> df.eq([250, 250, 100], axis='index')
cost revenue
A True False
B False True
C True False
Compare to a DataFrame of different shape.
>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other
revenue
A 300
B 250
C 100
D 150
>>> df.gt(other)
cost revenue
A False False
B False False
C False True
D False False
Compare to a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
>>> df.le(df_multindex, level=1)
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
- eval(self, expr: 'str', inplace: 'bool' = False, **kwargs)
- Evaluate a string describing operations on DataFrame columns.
Operates on columns only, not specific rows or elements. This allows
`eval` to run arbitrary code, which can make you vulnerable to code
injection if you pass user input to this function.
Parameters
----------
expr : str
The expression string to evaluate.
inplace : bool, default False
If the expression contains an assignment, whether to perform the
operation inplace and mutate the existing DataFrame. Otherwise,
a new DataFrame is returned.
**kwargs
See the documentation for :func:`eval` for complete details
on the keyword arguments accepted by
:meth:`~pandas.DataFrame.query`.
Returns
-------
ndarray, scalar, pandas object, or None
The result of the evaluation or None if ``inplace=True``.
See Also
--------
DataFrame.query : Evaluates a boolean expression to query the columns
of a frame.
DataFrame.assign : Can evaluate an expression or function to create new
values for a column.
eval : Evaluate a Python expression as a string using various
backends.
Notes
-----
For more details see the API documentation for :func:`~eval`.
For detailed examples see :ref:`enhancing performance with eval
<enhancingperf.eval>`.
Examples
--------
>>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
A B
0 1 10
1 2 8
2 3 6
3 4 4
4 5 2
>>> df.eval('A + B')
0 11
1 10
2 9
3 8
4 7
dtype: int64
Assignment is allowed though by default the original DataFrame is not
modified.
>>> df.eval('C = A + B')
A B C
0 1 10 11
1 2 8 10
2 3 6 9
3 4 4 8
4 5 2 7
>>> df
A B
0 1 10
1 2 8
2 3 6
3 4 4
4 5 2
Use ``inplace=True`` to modify the original DataFrame.
>>> df.eval('C = A + B', inplace=True)
>>> df
A B C
0 1 10 11
1 2 8 10
2 3 6 9
3 4 4 8
4 5 2 7
Multiple columns can be assigned to using multi-line expressions:
>>> df.eval(
... '''
... C = A + B
... D = A - B
... '''
... )
A B C D
0 1 10 11 -9
1 2 8 10 -6
2 3 6 9 -3
3 4 4 8 0
4 5 2 7 3
- explode(self, column: 'IndexLabel', ignore_index: 'bool' = False) -> 'DataFrame'
- Transform each element of a list-like to a row, replicating index values.
.. versionadded:: 0.25.0
Parameters
----------
column : IndexLabel
Column(s) to explode.
For multiple columns, specify a non-empty list with each element
be str or tuple, and all specified columns their list-like data
on same row of the frame must have matching length.
.. versionadded:: 1.3.0
Multi-column explode
ignore_index : bool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.1.0
Returns
-------
DataFrame
Exploded lists to rows of the subset columns;
index will be duplicated for these rows.
Raises
------
ValueError :
* If columns of the frame are not unique.
* If specified columns to explode is empty list.
* If specified columns to explode have not matching count of
elements rowwise in the frame.
See Also
--------
DataFrame.unstack : Pivot a level of the (necessarily hierarchical)
index labels.
DataFrame.melt : Unpivot a DataFrame from wide format to long format.
Series.explode : Explode a DataFrame from list-like columns to long format.
Notes
-----
This routine will explode list-likes including lists, tuples, sets,
Series, and np.ndarray. The result dtype of the subset rows will
be object. Scalars will be returned unchanged, and empty list-likes will
result in a np.nan for that row. In addition, the ordering of rows in the
output will be non-deterministic when exploding sets.
Reference :ref:`the user guide <reshaping.explode>` for more examples.
Examples
--------
>>> df = pd.DataFrame({'A': [[0, 1, 2], 'foo', [], [3, 4]],
... 'B': 1,
... 'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
>>> df
A B C
0 [0, 1, 2] 1 [a, b, c]
1 foo 1 NaN
2 [] 1 []
3 [3, 4] 1 [d, e]
Single-column explode.
>>> df.explode('A')
A B C
0 0 1 [a, b, c]
0 1 1 [a, b, c]
0 2 1 [a, b, c]
1 foo 1 NaN
2 NaN 1 []
3 3 1 [d, e]
3 4 1 [d, e]
Multi-column explode.
>>> df.explode(list('AC'))
A B C
0 0 1 a
0 1 1 b
0 2 1 c
1 foo 1 NaN
2 NaN 1 NaN
3 3 1 d
3 4 1 e
- ffill(self: 'DataFrame', axis: 'None | Axis' = None, inplace: 'bool' = False, limit: 'None | int' = None, downcast=None) -> 'DataFrame | None'
- Synonym for :meth:`DataFrame.fillna` with ``method='ffill'``.
Returns
-------
Series/DataFrame or None
Object with missing values filled or None if ``inplace=True``.
- fillna(self, value: 'object | ArrayLike | None' = None, method: 'FillnaOptions | None' = None, axis: 'Axis | None' = None, inplace: 'bool' = False, limit=None, downcast=None) -> 'DataFrame | None'
- Fill NA/NaN values using the specified method.
Parameters
----------
value : scalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a
dict/Series/DataFrame of values specifying which value to use for
each index (for a Series) or column (for a DataFrame). Values not
in the dict/Series/DataFrame will not be filled. This value cannot
be a list.
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use next valid observation to fill gap.
axis : {0 or 'index', 1 or 'columns'}
Axis along which to fill missing values.
inplace : bool, default False
If True, fill in-place. Note: this will modify any
other views on this object (e.g., a no-copy slice for a column in a
DataFrame).
limit : int, default None
If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
DataFrame or None
Object with missing values filled or None if ``inplace=True``.
See Also
--------
interpolate : Fill NaN values using interpolation.
reindex : Conform object to new index.
asfreq : Convert TimeSeries to specified frequency.
Examples
--------
>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
... [3, 4, np.nan, 1],
... [np.nan, np.nan, np.nan, np.nan],
... [np.nan, 3, np.nan, 4]],
... columns=list("ABCD"))
>>> df
A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 NaN NaN NaN NaN
3 NaN 3.0 NaN 4.0
Replace all NaN elements with 0s.
>>> df.fillna(0)
A B C D
0 0.0 2.0 0.0 0.0
1 3.0 4.0 0.0 1.0
2 0.0 0.0 0.0 0.0
3 0.0 3.0 0.0 4.0
We can also propagate non-null values forward or backward.
>>> df.fillna(method="ffill")
A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 3.0 4.0 NaN 1.0
3 3.0 3.0 NaN 4.0
Replace all NaN elements in column 'A', 'B', 'C', and 'D', with 0, 1,
2, and 3 respectively.
>>> values = {"A": 0, "B": 1, "C": 2, "D": 3}
>>> df.fillna(value=values)
A B C D
0 0.0 2.0 2.0 0.0
1 3.0 4.0 2.0 1.0
2 0.0 1.0 2.0 3.0
3 0.0 3.0 2.0 4.0
Only replace the first NaN element.
>>> df.fillna(value=values, limit=1)
A B C D
0 0.0 2.0 2.0 0.0
1 3.0 4.0 NaN 1.0
2 NaN 1.0 NaN 3.0
3 NaN 3.0 NaN 4.0
When filling using a DataFrame, replacement happens along
the same column names and same indices
>>> df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE"))
>>> df.fillna(df2)
A B C D
0 0.0 2.0 0.0 0.0
1 3.0 4.0 0.0 1.0
2 0.0 0.0 0.0 NaN
3 0.0 3.0 0.0 4.0
Note that column D is not affected since it is not present in df2.
- floordiv(self, other, axis='columns', level=None, fill_value=None)
- Get Integer division of dataframe and other, element-wise (binary operator `floordiv`).
Equivalent to ``dataframe // other``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `rfloordiv`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- ge(self, other, axis='columns', level=None)
- Get Greater than or equal to of dataframe and other, element-wise (binary operator `ge`).
Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison
operators.
Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis
(rows or columns) and level for comparison.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns').
level : int or label
Broadcast across a level, matching Index values on the passed
MultiIndex level.
Returns
-------
DataFrame of bool
Result of the comparison.
See Also
--------
DataFrame.eq : Compare DataFrames for equality elementwise.
DataFrame.ne : Compare DataFrames for inequality elementwise.
DataFrame.le : Compare DataFrames for less than inequality
or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.
Notes
-----
Mismatched indices will be unioned together.
`NaN` values are considered different (i.e. `NaN` != `NaN`).
Examples
--------
>>> df = pd.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df
cost revenue
A 250 100
B 150 250
C 100 300
Comparison with a scalar, using either the operator or method:
>>> df == 100
cost revenue
A False True
B False False
C True False
>>> df.eq(100)
cost revenue
A False True
B False False
C True False
When `other` is a :class:`Series`, the columns of a DataFrame are aligned
with the index of `other` and broadcast:
>>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost revenue
A True True
B True False
C False True
Use the method to control the broadcast axis:
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost revenue
A True False
B True True
C True True
D True True
When comparing to an arbitrary sequence, the number of columns must
match the number elements in `other`:
>>> df == [250, 100]
cost revenue
A True True
B False False
C False False
Use the method to control the axis:
>>> df.eq([250, 250, 100], axis='index')
cost revenue
A True False
B False True
C True False
Compare to a DataFrame of different shape.
>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other
revenue
A 300
B 250
C 100
D 150
>>> df.gt(other)
cost revenue
A False False
B False False
C False True
D False False
Compare to a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
>>> df.le(df_multindex, level=1)
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
- groupby(self, by=None, axis: 'Axis' = 0, level: 'Level | None' = None, as_index: 'bool' = True, sort: 'bool' = True, group_keys: 'bool' = True, squeeze: 'bool | lib.NoDefault' = <no_default>, observed: 'bool' = False, dropna: 'bool' = True) -> 'DataFrameGroupBy'
- Group DataFrame using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.
Parameters
----------
by : mapping, function, label, or list of labels
Used to determine the groups for the groupby.
If ``by`` is a function, it's called on each value of the object's
index. If a dict or Series is passed, the Series or dict VALUES
will be used to determine the groups (the Series' values are first
aligned; see ``.align()`` method). If a list or ndarray of length
equal to the selected axis is passed (see the `groupby user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups>`_),
the values are used as-is to determine the groups. A label or list
of labels may be passed to group by the columns in ``self``.
Notice that a tuple is interpreted as a (single) key.
axis : {0 or 'index', 1 or 'columns'}, default 0
Split along rows (0) or columns (1).
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular
level or levels.
as_index : bool, default True
For aggregated output, return object with group labels as the
index. Only relevant for DataFrame input. as_index=False is
effectively "SQL-style" grouped output.
sort : bool, default True
Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within each
group. Groupby preserves the order of rows within each group.
group_keys : bool, default True
When calling apply, add group keys to index to identify pieces.
squeeze : bool, default False
Reduce the dimensionality of the return type if possible,
otherwise return a consistent type.
.. deprecated:: 1.1.0
observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
dropna : bool, default True
If True, and if group keys contain NA values, NA values together
with row/column will be dropped.
If False, NA values will also be treated as the key in groups.
.. versionadded:: 1.1.0
Returns
-------
DataFrameGroupBy
Returns a groupby object that contains information about the groups.
See Also
--------
resample : Convenience method for frequency conversion and resampling
of time series.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/groupby.html>`__ for more
detailed usage and examples, including splitting an object into groups,
iterating through groups, selecting a group, aggregation, and more.
Examples
--------
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
... 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.]})
>>> df
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
>>> df.groupby(['Animal']).mean()
Max Speed
Animal
Falcon 375.0
Parrot 25.0
**Hierarchical Indexes**
We can groupby different levels of a hierarchical index
using the `level` parameter:
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
... ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
... index=index)
>>> df
Max Speed
Animal Type
Falcon Captive 390.0
Wild 350.0
Parrot Captive 30.0
Wild 20.0
>>> df.groupby(level=0).mean()
Max Speed
Animal
Falcon 370.0
Parrot 25.0
>>> df.groupby(level="Type").mean()
Max Speed
Type
Captive 210.0
Wild 185.0
We can also choose to include NA in group keys or not by setting
`dropna` parameter, the default setting is `True`.
>>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by=["b"]).sum()
a c
b
1.0 2 3
2.0 2 5
>>> df.groupby(by=["b"], dropna=False).sum()
a c
b
1.0 2 3
2.0 2 5
NaN 1 4
>>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])
>>> df.groupby(by="a").sum()
b c
a
a 13.0 13.0
b 12.3 123.0
>>> df.groupby(by="a", dropna=False).sum()
b c
a
a 13.0 13.0
b 12.3 123.0
NaN 12.3 33.0
- gt(self, other, axis='columns', level=None)
- Get Greater than of dataframe and other, element-wise (binary operator `gt`).
Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison
operators.
Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis
(rows or columns) and level for comparison.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns').
level : int or label
Broadcast across a level, matching Index values on the passed
MultiIndex level.
Returns
-------
DataFrame of bool
Result of the comparison.
See Also
--------
DataFrame.eq : Compare DataFrames for equality elementwise.
DataFrame.ne : Compare DataFrames for inequality elementwise.
DataFrame.le : Compare DataFrames for less than inequality
or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.
Notes
-----
Mismatched indices will be unioned together.
`NaN` values are considered different (i.e. `NaN` != `NaN`).
Examples
--------
>>> df = pd.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df
cost revenue
A 250 100
B 150 250
C 100 300
Comparison with a scalar, using either the operator or method:
>>> df == 100
cost revenue
A False True
B False False
C True False
>>> df.eq(100)
cost revenue
A False True
B False False
C True False
When `other` is a :class:`Series`, the columns of a DataFrame are aligned
with the index of `other` and broadcast:
>>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost revenue
A True True
B True False
C False True
Use the method to control the broadcast axis:
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost revenue
A True False
B True True
C True True
D True True
When comparing to an arbitrary sequence, the number of columns must
match the number elements in `other`:
>>> df == [250, 100]
cost revenue
A True True
B False False
C False False
Use the method to control the axis:
>>> df.eq([250, 250, 100], axis='index')
cost revenue
A True False
B False True
C True False
Compare to a DataFrame of different shape.
>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other
revenue
A 300
B 250
C 100
D 150
>>> df.gt(other)
cost revenue
A False False
B False False
C False True
D False False
Compare to a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
>>> df.le(df_multindex, level=1)
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
- hist = hist_frame(data: 'DataFrame', column: 'IndexLabel' = None, by=None, grid: 'bool' = True, xlabelsize: 'int | None' = None, xrot: 'float | None' = None, ylabelsize: 'int | None' = None, yrot: 'float | None' = None, ax=None, sharex: 'bool' = False, sharey: 'bool' = False, figsize: 'tuple[int, int] | None' = None, layout: 'tuple[int, int] | None' = None, bins: 'int | Sequence[int]' = 10, backend: 'str | None' = None, legend: 'bool' = False, **kwargs)
- Make a histogram of the DataFrame's columns.
A `histogram`_ is a representation of the distribution of data.
This function calls :meth:`matplotlib.pyplot.hist`, on each series in
the DataFrame, resulting in one histogram per column.
.. _histogram: https://en.wikipedia.org/wiki/Histogram
Parameters
----------
data : DataFrame
The pandas object holding the data.
column : str or sequence, optional
If passed, will be used to limit data to a subset of columns.
by : object, optional
If passed, then used to form histograms for separate groups.
grid : bool, default True
Whether to show axis grid lines.
xlabelsize : int, default None
If specified changes the x-axis label size.
xrot : float, default None
Rotation of x axis labels. For example, a value of 90 displays the
x labels rotated 90 degrees clockwise.
ylabelsize : int, default None
If specified changes the y-axis label size.
yrot : float, default None
Rotation of y axis labels. For example, a value of 90 displays the
y labels rotated 90 degrees clockwise.
ax : Matplotlib axes object, default None
The axes to plot the histogram on.
sharex : bool, default True if ax is None else False
In case subplots=True, share x axis and set some x axis labels to
invisible; defaults to True if ax is None otherwise False if an ax
is passed in.
Note that passing in both an ax and sharex=True will alter all x axis
labels for all subplots in a figure.
sharey : bool, default False
In case subplots=True, share y axis and set some y axis labels to
invisible.
figsize : tuple, optional
The size in inches of the figure to create. Uses the value in
`matplotlib.rcParams` by default.
layout : tuple, optional
Tuple of (rows, columns) for the layout of the histograms.
bins : int or sequence, default 10
Number of histogram bins to be used. If an integer is given, bins + 1
bin edges are calculated and returned. If bins is a sequence, gives
bin edges, including left edge of first bin and right edge of last
bin. In this case, bins is returned unmodified.
backend : str, default None
Backend to use instead of the backend specified in the option
``plotting.backend``. For instance, 'matplotlib'. Alternatively, to
specify the ``plotting.backend`` for the whole session, set
``pd.options.plotting.backend``.
.. versionadded:: 1.0.0
legend : bool, default False
Whether to show the legend.
.. versionadded:: 1.1.0
**kwargs
All other plotting keyword arguments to be passed to
:meth:`matplotlib.pyplot.hist`.
Returns
-------
matplotlib.AxesSubplot or numpy.ndarray of them
See Also
--------
matplotlib.pyplot.hist : Plot a histogram using matplotlib.
Examples
--------
This example draws a histogram based on the length and width of
some animals, displayed in three bins
.. plot::
:context: close-figs
>>> df = pd.DataFrame({
... 'length': [1.5, 0.5, 1.2, 0.9, 3],
... 'width': [0.7, 0.2, 0.15, 0.2, 1.1]
... }, index=['pig', 'rabbit', 'duck', 'chicken', 'horse'])
>>> hist = df.hist(bins=3)
- idxmax(self, axis: 'Axis' = 0, skipna: 'bool' = True) -> 'Series'
- Return index of first occurrence of maximum over requested axis.
NA/null values are excluded.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
Returns
-------
Series
Indexes of maxima along the specified axis.
Raises
------
ValueError
* If the row/column is empty
See Also
--------
Series.idxmax : Return index of the maximum element.
Notes
-----
This method is the DataFrame version of ``ndarray.argmax``.
Examples
--------
Consider a dataset containing food consumption in Argentina.
>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
... 'co2_emissions': [37.2, 19.66, 1712]},
... index=['Pork', 'Wheat Products', 'Beef'])
>>> df
consumption co2_emissions
Pork 10.51 37.20
Wheat Products 103.11 19.66
Beef 55.48 1712.00
By default, it returns the index for the maximum value in each column.
>>> df.idxmax()
consumption Wheat Products
co2_emissions Beef
dtype: object
To return the index for the maximum value in each row, use ``axis="columns"``.
>>> df.idxmax(axis="columns")
Pork co2_emissions
Wheat Products consumption
Beef co2_emissions
dtype: object
- idxmin(self, axis: 'Axis' = 0, skipna: 'bool' = True) -> 'Series'
- Return index of first occurrence of minimum over requested axis.
NA/null values are excluded.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for column-wise.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
Returns
-------
Series
Indexes of minima along the specified axis.
Raises
------
ValueError
* If the row/column is empty
See Also
--------
Series.idxmin : Return index of the minimum element.
Notes
-----
This method is the DataFrame version of ``ndarray.argmin``.
Examples
--------
Consider a dataset containing food consumption in Argentina.
>>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
... 'co2_emissions': [37.2, 19.66, 1712]},
... index=['Pork', 'Wheat Products', 'Beef'])
>>> df
consumption co2_emissions
Pork 10.51 37.20
Wheat Products 103.11 19.66
Beef 55.48 1712.00
By default, it returns the index for the minimum value in each column.
>>> df.idxmin()
consumption Pork
co2_emissions Wheat Products
dtype: object
To return the index for the minimum value in each row, use ``axis="columns"``.
>>> df.idxmin(axis="columns")
Pork consumption
Wheat Products co2_emissions
Beef consumption
dtype: object
- info(self, verbose: 'bool | None' = None, buf: 'WriteBuffer[str] | None' = None, max_cols: 'int | None' = None, memory_usage: 'bool | str | None' = None, show_counts: 'bool | None' = None, null_counts: 'bool | None' = None) -> 'None'
- Print a concise summary of a DataFrame.
This method prints information about a DataFrame including
the index dtype and columns, non-null values and memory usage.
Parameters
----------
data : DataFrame
DataFrame to print information about.
verbose : bool, optional
Whether to print the full summary. By default, the setting in
``pandas.options.display.max_info_columns`` is followed.
buf : writable buffer, defaults to sys.stdout
Where to send the output. By default, the output is printed to
sys.stdout. Pass a writable buffer if you need to further process
the output. max_cols : int, optional
When to switch from the verbose to the truncated output. If the
DataFrame has more than `max_cols` columns, the truncated output
is used. By default, the setting in
``pandas.options.display.max_info_columns`` is used.
memory_usage : bool, str, optional
Specifies whether total memory usage of the DataFrame
elements (including the index) should be displayed. By default,
this follows the ``pandas.options.display.memory_usage`` setting.
True always show memory usage. False never shows memory usage.
A value of 'deep' is equivalent to "True with deep introspection".
Memory usage is shown in human-readable units (base-2
representation). Without deep introspection a memory estimation is
made based in column dtype and number of rows assuming values
consume the same memory amount for corresponding dtypes. With deep
memory introspection, a real memory usage calculation is performed
at the cost of computational resources.
show_counts : bool, optional
Whether to show the non-null counts. By default, this is shown
only if the DataFrame is smaller than
``pandas.options.display.max_info_rows`` and
``pandas.options.display.max_info_columns``. A value of True always
shows the counts, and False never shows the counts.
null_counts : bool, optional
.. deprecated:: 1.2.0
Use show_counts instead.
Returns
-------
None
This method prints a summary of a DataFrame and returns None.
See Also
--------
DataFrame.describe: Generate descriptive statistics of DataFrame
columns.
DataFrame.memory_usage: Memory usage of DataFrame columns.
Examples
--------
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values,
... "float_col": float_values})
>>> df
int_col text_col float_col
0 1 alpha 0.00
1 2 beta 0.25
2 3 gamma 0.50
3 4 delta 0.75
4 5 epsilon 1.00
Prints information of all columns:
>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 int_col 5 non-null int64
1 text_col 5 non-null object
2 float_col 5 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes
Prints a summary of columns count and its dtypes but not per column
information:
>>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes
Pipe output of DataFrame.info to buffer instead of sys.stdout, get
buffer content and writes to a text file:
>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
... encoding="utf-8") as f: # doctest: +SKIP
... f.write(s)
260
The `memory_usage` parameter allows deep introspection mode, specially
useful for big DataFrames and fine-tune memory optimization:
>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> df = pd.DataFrame({
... 'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6),
... 'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6),
... 'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
... })
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 column_1 1000000 non-null object
1 column_2 1000000 non-null object
2 column_3 1000000 non-null object
dtypes: object(3)
memory usage: 22.9+ MB
>>> df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 column_1 1000000 non-null object
1 column_2 1000000 non-null object
2 column_3 1000000 non-null object
dtypes: object(3)
memory usage: 165.9 MB
- insert(self, loc: 'int', column: 'Hashable', value: 'Scalar | AnyArrayLike', allow_duplicates: 'bool' = False) -> 'None'
- Insert column into DataFrame at specified location.
Raises a ValueError if `column` is already contained in the DataFrame,
unless `allow_duplicates` is set to True.
Parameters
----------
loc : int
Insertion index. Must verify 0 <= loc <= len(columns).
column : str, number, or hashable object
Label of the inserted column.
value : Scalar, Series, or array-like
allow_duplicates : bool, optional default False
See Also
--------
Index.insert : Insert new item by index.
Examples
--------
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
col1 col2
0 1 3
1 2 4
>>> df.insert(1, "newcol", [99, 99])
>>> df
col1 newcol col2
0 1 99 3
1 2 99 4
>>> df.insert(0, "col1", [100, 100], allow_duplicates=True)
>>> df
col1 col1 newcol col2
0 100 1 99 3
1 100 2 99 4
Notice that pandas uses index alignment in case of `value` from type `Series`:
>>> df.insert(0, "col0", pd.Series([5, 6], index=[1, 2]))
>>> df
col0 col1 col1 newcol col2
0 NaN 100 1 99 3
1 5.0 100 2 99 4
- interpolate(self: 'DataFrame', method: 'str' = 'linear', axis: 'Axis' = 0, limit: 'int | None' = None, inplace: 'bool' = False, limit_direction: 'str | None' = None, limit_area: 'str | None' = None, downcast: 'str | None' = None, **kwargs) -> 'DataFrame | None'
- Fill NaN values using an interpolation method.
Please note that only ``method='linear'`` is supported for
DataFrame/Series with a MultiIndex.
Parameters
----------
method : str, default 'linear'
Interpolation technique to use. One of:
* 'linear': Ignore the index and treat the values as equally
spaced. This is the only method supported on MultiIndexes.
* 'time': Works on daily and higher resolution data to interpolate
given length of interval.
* 'index', 'values': use the actual numerical values of the index.
* 'pad': Fill in NaNs using existing values.
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
'barycentric', 'polynomial': Passed to
`scipy.interpolate.interp1d`. These methods use the numerical
values of the index. Both 'polynomial' and 'spline' require that
you also specify an `order` (int), e.g.
``df.interpolate(method='polynomial', order=5)``.
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima',
'cubicspline': Wrappers around the SciPy interpolation methods of
similar names. See `Notes`.
* 'from_derivatives': Refers to
`scipy.interpolate.BPoly.from_derivatives` which
replaces 'piecewise_polynomial' interpolation method in
scipy 0.18.
axis : {{0 or 'index', 1 or 'columns', None}}, default None
Axis to interpolate along.
limit : int, optional
Maximum number of consecutive NaNs to fill. Must be greater than
0.
inplace : bool, default False
Update the data in place if possible.
limit_direction : {{'forward', 'backward', 'both'}}, Optional
Consecutive NaNs will be filled in this direction.
If limit is specified:
* If 'method' is 'pad' or 'ffill', 'limit_direction' must be 'forward'.
* If 'method' is 'backfill' or 'bfill', 'limit_direction' must be
'backwards'.
If 'limit' is not specified:
* If 'method' is 'backfill' or 'bfill', the default is 'backward'
* else the default is 'forward'
.. versionchanged:: 1.1.0
raises ValueError if `limit_direction` is 'forward' or 'both' and
method is 'backfill' or 'bfill'.
raises ValueError if `limit_direction` is 'backward' or 'both' and
method is 'pad' or 'ffill'.
limit_area : {{`None`, 'inside', 'outside'}}, default None
If limit is specified, consecutive NaNs will be filled with this
restriction.
* ``None``: No fill restriction.
* 'inside': Only fill NaNs surrounded by valid values
(interpolate).
* 'outside': Only fill NaNs outside valid values (extrapolate).
downcast : optional, 'infer' or None, defaults to None
Downcast dtypes if possible.
``**kwargs`` : optional
Keyword arguments to pass on to the interpolating function.
Returns
-------
Series or DataFrame or None
Returns the same object type as the caller, interpolated at
some or all ``NaN`` values or None if ``inplace=True``.
See Also
--------
fillna : Fill missing values using different methods.
scipy.interpolate.Akima1DInterpolator : Piecewise cubic polynomials
(Akima interpolator).
scipy.interpolate.BPoly.from_derivatives : Piecewise polynomial in the
Bernstein basis.
scipy.interpolate.interp1d : Interpolate a 1-D function.
scipy.interpolate.KroghInterpolator : Interpolate polynomial (Krogh
interpolator).
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
interpolation.
scipy.interpolate.CubicSpline : Cubic spline data interpolator.
Notes
-----
The 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
methods are wrappers around the respective SciPy implementations of
similar names. These use the actual numerical values of the index.
For more information on their behavior, see the
`SciPy documentation
<https://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `SciPy tutorial
<https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.
Examples
--------
Filling in ``NaN`` in a :class:`~pandas.Series` via linear
interpolation.
>>> s = pd.Series([0, 1, np.nan, 3])
>>> s
0 0.0
1 1.0
2 NaN
3 3.0
dtype: float64
>>> s.interpolate()
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64
Filling in ``NaN`` in a Series by padding, but filling at most two
consecutive ``NaN`` at a time.
>>> s = pd.Series([np.nan, "single_one", np.nan,
... "fill_two_more", np.nan, np.nan, np.nan,
... 4.71, np.nan])
>>> s
0 NaN
1 single_one
2 NaN
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0 NaN
1 single_one
2 single_one
3 fill_two_more
4 fill_two_more
5 fill_two_more
6 NaN
7 4.71
8 4.71
dtype: object
Filling in ``NaN`` in a Series via polynomial interpolation or splines:
Both 'polynomial' and 'spline' methods require that you also specify
an ``order`` (int).
>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0 0.000000
1 2.000000
2 4.666667
3 8.000000
dtype: float64
Fill the DataFrame forward (that is, going down) along each column
using linear interpolation.
Note how the last entry in column 'a' is interpolated differently,
because there is no entry after it to use for interpolation.
Note how the first entry in column 'b' remains ``NaN``, because there
is no entry before it to use for interpolation.
>>> df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
... (np.nan, 2.0, np.nan, np.nan),
... (2.0, 3.0, np.nan, 9.0),
... (np.nan, 4.0, -4.0, 16.0)],
... columns=list('abcd'))
>>> df
a b c d
0 0.0 NaN -1.0 1.0
1 NaN 2.0 NaN NaN
2 2.0 3.0 NaN 9.0
3 NaN 4.0 -4.0 16.0
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
a b c d
0 0.0 NaN -1.0 1.0
1 1.0 2.0 -2.0 5.0
2 2.0 3.0 -3.0 9.0
3 2.0 4.0 -4.0 16.0
Using polynomial interpolation.
>>> df['d'].interpolate(method='polynomial', order=2)
0 1.0
1 4.0
2 9.0
3 16.0
Name: d, dtype: float64
- isin(self, values) -> 'DataFrame'
- Whether each element in the DataFrame is contained in values.
Parameters
----------
values : iterable, Series, DataFrame or dict
The result will only be true at a location if all the
labels match. If `values` is a Series, that's the index. If
`values` is a dict, the keys must be the column names,
which must match. If `values` is a DataFrame,
then both the index and column labels must match.
Returns
-------
DataFrame
DataFrame of booleans showing whether each element in the DataFrame
is contained in values.
See Also
--------
DataFrame.eq: Equality test for DataFrame.
Series.isin: Equivalent method on Series.
Series.str.contains: Test if pattern or regex is contained within a
string of a Series or Index.
Examples
--------
>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
... index=['falcon', 'dog'])
>>> df
num_legs num_wings
falcon 2 2
dog 4 0
When ``values`` is a list check whether every value in the DataFrame
is present in the list (which animals have 0 or 2 legs or wings)
>>> df.isin([0, 2])
num_legs num_wings
falcon True True
dog False True
To check if ``values`` is *not* in the DataFrame, use the ``~`` operator:
>>> ~df.isin([0, 2])
num_legs num_wings
falcon False False
dog True False
When ``values`` is a dict, we can pass values to check for each
column separately:
>>> df.isin({'num_wings': [0, 3]})
num_legs num_wings
falcon False False
dog False True
When ``values`` is a Series or DataFrame the index and column must
match. Note that 'falcon' does not match based on the number of legs
in other.
>>> other = pd.DataFrame({'num_legs': [8, 3], 'num_wings': [0, 2]},
... index=['spider', 'falcon'])
>>> df.isin(other)
num_legs num_wings
falcon False True
dog False False
- isna(self) -> 'DataFrame'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or :attr:`numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
DataFrame
Mask of bool values for each element in DataFrame that
indicates whether an element is an NA value.
See Also
--------
DataFrame.isnull : Alias of isna.
DataFrame.notna : Boolean inverse of isna.
DataFrame.dropna : Omit axes labels with missing values.
isna : Top-level isna.
Examples
--------
Show which entries in a DataFrame are NA.
>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
... born=[pd.NaT, pd.Timestamp('1939-05-27'),
... pd.Timestamp('1940-04-25')],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker']))
>>> df
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
>>> df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
Show which entries in a Series are NA.
>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0 5.0
1 6.0
2 NaN
dtype: float64
>>> ser.isna()
0 False
1 False
2 True
dtype: bool
- isnull(self) -> 'DataFrame'
- DataFrame.isnull is an alias for DataFrame.isna.
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or :attr:`numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
DataFrame
Mask of bool values for each element in DataFrame that
indicates whether an element is an NA value.
See Also
--------
DataFrame.isnull : Alias of isna.
DataFrame.notna : Boolean inverse of isna.
DataFrame.dropna : Omit axes labels with missing values.
isna : Top-level isna.
Examples
--------
Show which entries in a DataFrame are NA.
>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
... born=[pd.NaT, pd.Timestamp('1939-05-27'),
... pd.Timestamp('1940-04-25')],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker']))
>>> df
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
>>> df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
Show which entries in a Series are NA.
>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0 5.0
1 6.0
2 NaN
dtype: float64
>>> ser.isna()
0 False
1 False
2 True
dtype: bool
- items(self) -> 'Iterable[tuple[Hashable, Series]]'
- Iterate over (column name, Series) pairs.
Iterates over the DataFrame columns, returning a tuple with
the column name and the content as a Series.
Yields
------
label : object
The column names for the DataFrame being iterated over.
content : Series
The column entries belonging to each label, as a Series.
See Also
--------
DataFrame.iterrows : Iterate over DataFrame rows as
(index, Series) pairs.
DataFrame.itertuples : Iterate over DataFrame rows as namedtuples
of the values.
Examples
--------
>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
... 'population': [1864, 22000, 80000]},
... index=['panda', 'polar', 'koala'])
>>> df
species population
panda bear 1864
polar bear 22000
koala marsupial 80000
>>> for label, content in df.items():
... print(f'label: {label}')
... print(f'content: {content}', sep='\n')
...
label: species
content:
panda bear
polar bear
koala marsupial
Name: species, dtype: object
label: population
content:
panda 1864
polar 22000
koala 80000
Name: population, dtype: int64
- iteritems(self) -> 'Iterable[tuple[Hashable, Series]]'
- Iterate over (column name, Series) pairs.
Iterates over the DataFrame columns, returning a tuple with
the column name and the content as a Series.
Yields
------
label : object
The column names for the DataFrame being iterated over.
content : Series
The column entries belonging to each label, as a Series.
See Also
--------
DataFrame.iterrows : Iterate over DataFrame rows as
(index, Series) pairs.
DataFrame.itertuples : Iterate over DataFrame rows as namedtuples
of the values.
Examples
--------
>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
... 'population': [1864, 22000, 80000]},
... index=['panda', 'polar', 'koala'])
>>> df
species population
panda bear 1864
polar bear 22000
koala marsupial 80000
>>> for label, content in df.items():
... print(f'label: {label}')
... print(f'content: {content}', sep='\n')
...
label: species
content:
panda bear
polar bear
koala marsupial
Name: species, dtype: object
label: population
content:
panda 1864
polar 22000
koala 80000
Name: population, dtype: int64
- iterrows(self) -> 'Iterable[tuple[Hashable, Series]]'
- Iterate over DataFrame rows as (index, Series) pairs.
Yields
------
index : label or tuple of label
The index of the row. A tuple for a `MultiIndex`.
data : Series
The data of the row as a Series.
See Also
--------
DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.
DataFrame.items : Iterate over (column name, Series) pairs.
Notes
-----
1. Because ``iterrows`` returns a Series for each row,
it does **not** preserve dtypes across the rows (dtypes are
preserved across columns for DataFrames). For example,
>>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
>>> row = next(df.iterrows())[1]
>>> row
int 1.0
float 1.5
Name: 0, dtype: float64
>>> print(row['int'].dtype)
float64
>>> print(df['int'].dtype)
int64
To preserve dtypes while iterating over the rows, it is better
to use :meth:`itertuples` which returns namedtuples of the values
and which is generally faster than ``iterrows``.
2. You should **never modify** something you are iterating over.
This is not guaranteed to work in all cases. Depending on the
data types, the iterator returns a copy and not a view, and writing
to it will have no effect.
- itertuples(self, index: 'bool' = True, name: 'str | None' = 'Pandas') -> 'Iterable[tuple[Any, ...]]'
- Iterate over DataFrame rows as namedtuples.
Parameters
----------
index : bool, default True
If True, return the index as the first element of the tuple.
name : str or None, default "Pandas"
The name of the returned namedtuples or None to return regular
tuples.
Returns
-------
iterator
An object to iterate over namedtuples for each row in the
DataFrame with the first field possibly being the index and
following fields being the column values.
See Also
--------
DataFrame.iterrows : Iterate over DataFrame rows as (index, Series)
pairs.
DataFrame.items : Iterate over (column name, Series) pairs.
Notes
-----
The column names will be renamed to positional names if they are
invalid Python identifiers, repeated, or start with an underscore.
On python versions < 3.7 regular tuples are returned for DataFrames
with a large number of columns (>254).
Examples
--------
>>> df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
... index=['dog', 'hawk'])
>>> df
num_legs num_wings
dog 4 0
hawk 2 2
>>> for row in df.itertuples():
... print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)
By setting the `index` parameter to False we can remove the index
as the first element of the tuple:
>>> for row in df.itertuples(index=False):
... print(row)
...
Pandas(num_legs=4, num_wings=0)
Pandas(num_legs=2, num_wings=2)
With the `name` parameter set we set a custom name for the yielded
namedtuples:
>>> for row in df.itertuples(name='Animal'):
... print(row)
...
Animal(Index='dog', num_legs=4, num_wings=0)
Animal(Index='hawk', num_legs=2, num_wings=2)
- join(self, other: 'DataFrame | Series', on: 'IndexLabel | None' = None, how: 'str' = 'left', lsuffix: 'str' = '', rsuffix: 'str' = '', sort: 'bool' = False) -> 'DataFrame'
- Join columns of another DataFrame.
Join columns with `other` DataFrame either on index or on a key
column. Efficiently join multiple DataFrame objects by index at once by
passing a list.
Parameters
----------
other : DataFrame, Series, or list of DataFrame
Index should be similar to one of the columns in this one. If a
Series is passed, its name attribute must be set, and that will be
used as the column name in the resulting joined DataFrame.
on : str, list of str, or array-like, optional
Column or index level name(s) in the caller to join on the index
in `other`, otherwise joins index-on-index. If multiple
values given, the `other` DataFrame must have a MultiIndex. Can
pass an array as the join key if it is not already contained in
the calling DataFrame. Like an Excel VLOOKUP operation.
how : {'left', 'right', 'outer', 'inner'}, default 'left'
How to handle the operation of the two objects.
* left: use calling frame's index (or column if on is specified)
* right: use `other`'s index.
* outer: form union of calling frame's index (or column if on is
specified) with `other`'s index, and sort it.
lexicographically.
* inner: form intersection of calling frame's index (or column if
on is specified) with `other`'s index, preserving the order
of the calling's one.
* cross: creates the cartesian product from both frames, preserves the order
of the left keys.
.. versionadded:: 1.2.0
lsuffix : str, default ''
Suffix to use from left frame's overlapping columns.
rsuffix : str, default ''
Suffix to use from right frame's overlapping columns.
sort : bool, default False
Order result DataFrame lexicographically by the join key. If False,
the order of the join key depends on the join type (how keyword).
Returns
-------
DataFrame
A dataframe containing columns from both the caller and `other`.
See Also
--------
DataFrame.merge : For column(s)-on-column(s) operations.
Notes
-----
Parameters `on`, `lsuffix`, and `rsuffix` are not supported when
passing a list of `DataFrame` objects.
Support for specifying index levels as the `on` parameter was added
in version 0.23.0.
Examples
--------
>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
>>> df
key A
0 K0 A0
1 K1 A1
2 K2 A2
3 K3 A3
4 K4 A4
5 K5 A5
>>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
... 'B': ['B0', 'B1', 'B2']})
>>> other
key B
0 K0 B0
1 K1 B1
2 K2 B2
Join DataFrames using their indexes.
>>> df.join(other, lsuffix='_caller', rsuffix='_other')
key_caller A key_other B
0 K0 A0 K0 B0
1 K1 A1 K1 B1
2 K2 A2 K2 B2
3 K3 A3 NaN NaN
4 K4 A4 NaN NaN
5 K5 A5 NaN NaN
If we want to join using the key columns, we need to set key to be
the index in both `df` and `other`. The joined DataFrame will have
key as its index.
>>> df.set_index('key').join(other.set_index('key'))
A B
key
K0 A0 B0
K1 A1 B1
K2 A2 B2
K3 A3 NaN
K4 A4 NaN
K5 A5 NaN
Another option to join using the key columns is to use the `on`
parameter. DataFrame.join always uses `other`'s index but we can use
any column in `df`. This method preserves the original DataFrame's
index in the result.
>>> df.join(other.set_index('key'), on='key')
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 NaN
4 K4 A4 NaN
5 K5 A5 NaN
Using non-unique key values shows how they are matched.
>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K1', 'K3', 'K0', 'K1'],
... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
>>> df
key A
0 K0 A0
1 K1 A1
2 K1 A2
3 K3 A3
4 K0 A4
5 K1 A5
>>> df.join(other.set_index('key'), on='key')
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K1 A2 B1
3 K3 A3 NaN
4 K0 A4 B0
5 K1 A5 B1
- kurt(self, axis: 'Axis | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher's definition of
kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
Series or DataFrame (if level specified)
- kurtosis = kurt(self, axis: 'Axis | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- le(self, other, axis='columns', level=None)
- Get Less than or equal to of dataframe and other, element-wise (binary operator `le`).
Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison
operators.
Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis
(rows or columns) and level for comparison.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns').
level : int or label
Broadcast across a level, matching Index values on the passed
MultiIndex level.
Returns
-------
DataFrame of bool
Result of the comparison.
See Also
--------
DataFrame.eq : Compare DataFrames for equality elementwise.
DataFrame.ne : Compare DataFrames for inequality elementwise.
DataFrame.le : Compare DataFrames for less than inequality
or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.
Notes
-----
Mismatched indices will be unioned together.
`NaN` values are considered different (i.e. `NaN` != `NaN`).
Examples
--------
>>> df = pd.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df
cost revenue
A 250 100
B 150 250
C 100 300
Comparison with a scalar, using either the operator or method:
>>> df == 100
cost revenue
A False True
B False False
C True False
>>> df.eq(100)
cost revenue
A False True
B False False
C True False
When `other` is a :class:`Series`, the columns of a DataFrame are aligned
with the index of `other` and broadcast:
>>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost revenue
A True True
B True False
C False True
Use the method to control the broadcast axis:
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost revenue
A True False
B True True
C True True
D True True
When comparing to an arbitrary sequence, the number of columns must
match the number elements in `other`:
>>> df == [250, 100]
cost revenue
A True True
B False False
C False False
Use the method to control the axis:
>>> df.eq([250, 250, 100], axis='index')
cost revenue
A True False
B False True
C True False
Compare to a DataFrame of different shape.
>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other
revenue
A 300
B 250
C 100
D 150
>>> df.gt(other)
cost revenue
A False False
B False False
C False True
D False False
Compare to a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
>>> df.le(df_multindex, level=1)
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
- lookup(self, row_labels: 'Sequence[IndexLabel]', col_labels: 'Sequence[IndexLabel]') -> 'np.ndarray'
- Label-based "fancy indexing" function for DataFrame.
Given equal-length arrays of row and column labels, return an
array of the values corresponding to each (row, col) pair.
.. deprecated:: 1.2.0
DataFrame.lookup is deprecated,
use DataFrame.melt and DataFrame.loc instead.
For further details see
:ref:`Looking up values by index/column labels <indexing.lookup>`.
Parameters
----------
row_labels : sequence
The row labels to use for lookup.
col_labels : sequence
The column labels to use for lookup.
Returns
-------
numpy.ndarray
The found values.
- lt(self, other, axis='columns', level=None)
- Get Less than of dataframe and other, element-wise (binary operator `lt`).
Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison
operators.
Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis
(rows or columns) and level for comparison.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns').
level : int or label
Broadcast across a level, matching Index values on the passed
MultiIndex level.
Returns
-------
DataFrame of bool
Result of the comparison.
See Also
--------
DataFrame.eq : Compare DataFrames for equality elementwise.
DataFrame.ne : Compare DataFrames for inequality elementwise.
DataFrame.le : Compare DataFrames for less than inequality
or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.
Notes
-----
Mismatched indices will be unioned together.
`NaN` values are considered different (i.e. `NaN` != `NaN`).
Examples
--------
>>> df = pd.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df
cost revenue
A 250 100
B 150 250
C 100 300
Comparison with a scalar, using either the operator or method:
>>> df == 100
cost revenue
A False True
B False False
C True False
>>> df.eq(100)
cost revenue
A False True
B False False
C True False
When `other` is a :class:`Series`, the columns of a DataFrame are aligned
with the index of `other` and broadcast:
>>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost revenue
A True True
B True False
C False True
Use the method to control the broadcast axis:
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost revenue
A True False
B True True
C True True
D True True
When comparing to an arbitrary sequence, the number of columns must
match the number elements in `other`:
>>> df == [250, 100]
cost revenue
A True True
B False False
C False False
Use the method to control the axis:
>>> df.eq([250, 250, 100], axis='index')
cost revenue
A True False
B False True
C True False
Compare to a DataFrame of different shape.
>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other
revenue
A 300
B 250
C 100
D 150
>>> df.gt(other)
cost revenue
A False False
B False False
C False True
D False False
Compare to a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
>>> df.le(df_multindex, level=1)
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
- mad(self, axis=None, skipna=True, level=None)
- Return the mean absolute deviation of the values over the requested axis.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
Returns
-------
Series or DataFrame (if level specified)
- mask(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=<no_default>)
- Replace values where the condition is True.
Parameters
----------
cond : bool Series/DataFrame, array-like, or callable
Where `cond` is False, keep the original value. Where
True, replace with corresponding value from `other`.
If `cond` is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn't check it).
other : scalar, Series/DataFrame, or callable
Entries where `cond` is True are replaced with
corresponding value from `other`.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn't check it).
inplace : bool, default False
Whether to perform the operation in place on the data.
axis : int, default None
Alignment axis if needed.
level : int, default None
Alignment level if needed.
errors : str, {'raise', 'ignore'}, default 'raise'
Note that currently this parameter won't affect
the results and will always coerce to a suitable dtype.
- 'raise' : allow exceptions to be raised.
- 'ignore' : suppress exceptions. On error return original object.
try_cast : bool, default None
Try to cast the result back to the input type (if possible).
.. deprecated:: 1.3.0
Manually cast back if necessary.
Returns
-------
Same type as caller or None if ``inplace=True``.
See Also
--------
:func:`DataFrame.where` : Return an object of same shape as
self.
Notes
-----
The mask method is an application of the if-then idiom. For each
element in the calling DataFrame, if ``cond`` is ``False`` the
element is used; otherwise the corresponding element from the DataFrame
``other`` is used.
The signature for :func:`DataFrame.where` differs from
:func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
``np.where(m, df1, df2)``.
For further details and examples see the ``mask`` documentation in
:ref:`indexing <indexing.where_mask>`.
Examples
--------
>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
>>> s.mask(s > 0)
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
>>> s.where(s > 1, 10)
0 10
1 10
2 2
3 3
4 4
dtype: int64
>>> s.mask(s > 1, 10)
0 0
1 1
2 10
3 10
4 10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> m = df % 3 == 0
>>> df.where(m, -df)
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == np.where(m, df, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
- max(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return the maximum of the values over the requested axis.
If you want the *index* of the maximum, use ``idxmax``. This is the equivalent of the ``numpy.ndarray`` method ``argmax``.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
Series or DataFrame (if level specified)
See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.sum : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.
Examples
--------
>>> idx = pd.MultiIndex.from_arrays([
... ['warm', 'warm', 'cold', 'cold'],
... ['dog', 'falcon', 'fish', 'spider']],
... names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded animal
warm dog 4
falcon 2
cold fish 0
spider 8
Name: legs, dtype: int64
>>> s.max()
8
- mean(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return the mean of the values over the requested axis.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
Series or DataFrame (if level specified)
- median(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return the median of the values over the requested axis.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
Series or DataFrame (if level specified)
- melt(self, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level: 'Level | None' = None, ignore_index: 'bool' = True) -> 'DataFrame'
- Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
Parameters
----------
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or str, optional
If columns are a MultiIndex then use this level to melt.
ignore_index : bool, default True
If True, original index is ignored. If False, the original index is retained.
Index labels will be repeated as necessary.
.. versionadded:: 1.1.0
Returns
-------
DataFrame
Unpivoted DataFrame.
See Also
--------
melt : Identical method.
pivot_table : Create a spreadsheet-style pivot table as a DataFrame.
DataFrame.pivot : Return reshaped DataFrame organized
by given index / column values.
DataFrame.explode : Explode a DataFrame from list-like
columns to long format.
Notes
-----
Reference :ref:`the user guide <reshaping.melt>` for more examples.
Examples
--------
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6
>>> df.melt(id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> df.melt(id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6
The names of 'variable' and 'value' columns can be customized:
>>> df.melt(id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5
Original index values can be kept around:
>>> df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)
A variable value
0 a B 1
1 b B 3
2 c B 5
0 a C 2
1 b C 4
2 c C 6
If you have multi-index columns:
>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6
>>> df.melt(col_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5
>>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5
- memory_usage(self, index: 'bool' = True, deep: 'bool' = False) -> 'Series'
- Return the memory usage of each column in bytes.
The memory usage can optionally include the contribution of
the index and elements of `object` dtype.
This value is displayed in `DataFrame.info` by default. This can be
suppressed by setting ``pandas.options.display.memory_usage`` to False.
Parameters
----------
index : bool, default True
Specifies whether to include the memory usage of the DataFrame's
index in returned Series. If ``index=True``, the memory usage of
the index is the first item in the output.
deep : bool, default False
If True, introspect the data deeply by interrogating
`object` dtypes for system-level memory consumption, and include
it in the returned values.
Returns
-------
Series
A Series whose index is the original column names and whose values
is the memory usage of each column in bytes.
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of an
ndarray.
Series.memory_usage : Bytes consumed by a Series.
Categorical : Memory-efficient array for string values with
many repeated values.
DataFrame.info : Concise summary of a DataFrame.
Examples
--------
>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t))
... for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
int64 float64 complex128 object bool
0 1 1.0 1.0+0.0j 1 True
1 1 1.0 1.0+0.0j 1 True
2 1 1.0 1.0+0.0j 1 True
3 1 1.0 1.0+0.0j 1 True
4 1 1.0 1.0+0.0j 1 True
>>> df.memory_usage()
Index 128
int64 40000
float64 40000
complex128 80000
object 40000
bool 5000
dtype: int64
>>> df.memory_usage(index=False)
int64 40000
float64 40000
complex128 80000
object 40000
bool 5000
dtype: int64
The memory footprint of `object` dtype columns is ignored by default:
>>> df.memory_usage(deep=True)
Index 128
int64 40000
float64 40000
complex128 80000
object 180000
bool 5000
dtype: int64
Use a Categorical for efficient storage of an object-dtype column with
many repeated values.
>>> df['object'].astype('category').memory_usage(deep=True)
5244
- merge(self, right: 'DataFrame | Series', how: 'str' = 'inner', on: 'IndexLabel | None' = None, left_on: 'IndexLabel | None' = None, right_on: 'IndexLabel | None' = None, left_index: 'bool' = False, right_index: 'bool' = False, sort: 'bool' = False, suffixes: 'Suffixes' = ('_x', '_y'), copy: 'bool' = True, indicator: 'bool' = False, validate: 'str | None' = None) -> 'DataFrame'
- Merge DataFrame or named Series objects with a database-style join.
A named Series object is treated as a DataFrame with a single named column.
The join is done on columns or indexes. If joining columns on
columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes
on indexes or indexes on a column or columns, the index will be passed on.
When performing a cross merge, no column specifications to merge on are
allowed.
.. warning::
If both key columns contain rows where the key is a null value, those
rows will be matched against each other. This is different from usual SQL
join behaviour and can lead to unexpected results.
Parameters
----------
right : DataFrame or named Series
Object to merge with.
how : {'left', 'right', 'outer', 'inner', 'cross'}, default 'inner'
Type of merge to be performed.
* left: use only keys from left frame, similar to a SQL left outer join;
preserve key order.
* right: use only keys from right frame, similar to a SQL right outer join;
preserve key order.
* outer: use union of keys from both frames, similar to a SQL full outer
join; sort keys lexicographically.
* inner: use intersection of keys from both frames, similar to a SQL inner
join; preserve the order of the left keys.
* cross: creates the cartesian product from both frames, preserves the order
of the left keys.
.. versionadded:: 1.2.0
on : label or list
Column or index level names to join on. These must be found in both
DataFrames. If `on` is None and not merging on indexes then this defaults
to the intersection of the columns in both DataFrames.
left_on : label or list, or array-like
Column or index level names to join on in the left DataFrame. Can also
be an array or list of arrays of the length of the left DataFrame.
These arrays are treated as if they are columns.
right_on : label or list, or array-like
Column or index level names to join on in the right DataFrame. Can also
be an array or list of arrays of the length of the right DataFrame.
These arrays are treated as if they are columns.
left_index : bool, default False
Use the index from the left DataFrame as the join key(s). If it is a
MultiIndex, the number of keys in the other DataFrame (either the index
or a number of columns) must match the number of levels.
right_index : bool, default False
Use the index from the right DataFrame as the join key. Same caveats as
left_index.
sort : bool, default False
Sort the join keys lexicographically in the result DataFrame. If False,
the order of the join keys depends on the join type (how keyword).
suffixes : list-like, default is ("_x", "_y")
A length-2 sequence where each element is optionally a string
indicating the suffix to add to overlapping column names in
`left` and `right` respectively. Pass a value of `None` instead
of a string to indicate that the column name from `left` or
`right` should be left as-is, with no suffix. At least one of the
values must not be None.
copy : bool, default True
If False, avoid copy if possible.
indicator : bool or str, default False
If True, adds a column to the output DataFrame called "_merge" with
information on the source of each row. The column can be given a different
name by providing a string argument. The column will have a Categorical
type with the value of "left_only" for observations whose merge key only
appears in the left DataFrame, "right_only" for observations
whose merge key only appears in the right DataFrame, and "both"
if the observation's merge key is found in both DataFrames.
validate : str, optional
If specified, checks if merge is of specified type.
* "one_to_one" or "1:1": check if merge keys are unique in both
left and right datasets.
* "one_to_many" or "1:m": check if merge keys are unique in left
dataset.
* "many_to_one" or "m:1": check if merge keys are unique in right
dataset.
* "many_to_many" or "m:m": allowed, but does not result in checks.
Returns
-------
DataFrame
A DataFrame of the two merged objects.
See Also
--------
merge_ordered : Merge with optional filling/interpolation.
merge_asof : Merge on nearest keys.
DataFrame.join : Similar method using indices.
Notes
-----
Support for specifying index levels as the `on`, `left_on`, and
`right_on` parameters was added in version 0.23.0
Support for merging named Series objects was added in version 0.24.0
Examples
--------
>>> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [1, 2, 3, 5]})
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [5, 6, 7, 8]})
>>> df1
lkey value
0 foo 1
1 bar 2
2 baz 3
3 foo 5
>>> df2
rkey value
0 foo 5
1 bar 6
2 baz 7
3 foo 8
Merge df1 and df2 on the lkey and rkey columns. The value columns have
the default suffixes, _x and _y, appended.
>>> df1.merge(df2, left_on='lkey', right_on='rkey')
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
Merge DataFrames df1 and df2 with specified left and right suffixes
appended to any overlapping columns.
>>> df1.merge(df2, left_on='lkey', right_on='rkey',
... suffixes=('_left', '_right'))
lkey value_left rkey value_right
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
Merge DataFrames df1 and df2, but raise an exception if the DataFrames have
any overlapping columns.
>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
Index(['value'], dtype='object')
>>> df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df1
a b
0 foo 1
1 bar 2
>>> df2
a c
0 foo 3
1 baz 4
>>> df1.merge(df2, how='inner', on='a')
a b c
0 foo 1 3
>>> df1.merge(df2, how='left', on='a')
a b c
0 foo 1 3.0
1 bar 2 NaN
>>> df1 = pd.DataFrame({'left': ['foo', 'bar']})
>>> df2 = pd.DataFrame({'right': [7, 8]})
>>> df1
left
0 foo
1 bar
>>> df2
right
0 7
1 8
>>> df1.merge(df2, how='cross')
left right
0 foo 7
1 foo 8
2 bar 7
3 bar 8
- min(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return the minimum of the values over the requested axis.
If you want the *index* of the minimum, use ``idxmin``. This is the equivalent of the ``numpy.ndarray`` method ``argmin``.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
Series or DataFrame (if level specified)
See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.sum : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.
Examples
--------
>>> idx = pd.MultiIndex.from_arrays([
... ['warm', 'warm', 'cold', 'cold'],
... ['dog', 'falcon', 'fish', 'spider']],
... names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded animal
warm dog 4
falcon 2
cold fish 0
spider 8
Name: legs, dtype: int64
>>> s.min()
0
- mod(self, other, axis='columns', level=None, fill_value=None)
- Get Modulo of dataframe and other, element-wise (binary operator `mod`).
Equivalent to ``dataframe % other``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `rmod`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- mode(self, axis: 'Axis' = 0, numeric_only: 'bool' = False, dropna: 'bool' = True) -> 'DataFrame'
- Get the mode(s) of each element along the selected axis.
The mode of a set of values is the value that appears most often.
It can be multiple values.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to iterate over while searching for the mode:
* 0 or 'index' : get mode of each column
* 1 or 'columns' : get mode of each row.
numeric_only : bool, default False
If True, only apply to numeric columns.
dropna : bool, default True
Don't consider counts of NaN/NaT.
Returns
-------
DataFrame
The modes of each column or row.
See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Return the counts of values in a Series.
Examples
--------
>>> df = pd.DataFrame([('bird', 2, 2),
... ('mammal', 4, np.nan),
... ('arthropod', 8, 0),
... ('bird', 2, np.nan)],
... index=('falcon', 'horse', 'spider', 'ostrich'),
... columns=('species', 'legs', 'wings'))
>>> df
species legs wings
falcon bird 2 2.0
horse mammal 4 NaN
spider arthropod 8 0.0
ostrich bird 2 NaN
By default, missing values are not considered, and the mode of wings
are both 0 and 2. Because the resulting DataFrame has two rows,
the second row of ``species`` and ``legs`` contains ``NaN``.
>>> df.mode()
species legs wings
0 bird 2.0 0.0
1 NaN NaN 2.0
Setting ``dropna=False`` ``NaN`` values are considered and they can be
the mode (like for wings).
>>> df.mode(dropna=False)
species legs wings
0 bird 2 NaN
Setting ``numeric_only=True``, only the mode of numeric columns is
computed, and columns of other types are ignored.
>>> df.mode(numeric_only=True)
legs wings
0 2.0 0.0
1 NaN 2.0
To compute the mode over columns and not rows, use the axis parameter:
>>> df.mode(axis='columns', numeric_only=True)
0 1
falcon 2.0 NaN
horse 4.0 NaN
spider 0.0 8.0
ostrich 2.0 NaN
- mul(self, other, axis='columns', level=None, fill_value=None)
- Get Multiplication of dataframe and other, element-wise (binary operator `mul`).
Equivalent to ``dataframe * other``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `rmul`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- multiply = mul(self, other, axis='columns', level=None, fill_value=None)
- ne(self, other, axis='columns', level=None)
- Get Not equal to of dataframe and other, element-wise (binary operator `ne`).
Among flexible wrappers (`eq`, `ne`, `le`, `lt`, `ge`, `gt`) to comparison
operators.
Equivalent to `==`, `!=`, `<=`, `<`, `>=`, `>` with support to choose axis
(rows or columns) and level for comparison.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}, default 'columns'
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns').
level : int or label
Broadcast across a level, matching Index values on the passed
MultiIndex level.
Returns
-------
DataFrame of bool
Result of the comparison.
See Also
--------
DataFrame.eq : Compare DataFrames for equality elementwise.
DataFrame.ne : Compare DataFrames for inequality elementwise.
DataFrame.le : Compare DataFrames for less than inequality
or equality elementwise.
DataFrame.lt : Compare DataFrames for strictly less than
inequality elementwise.
DataFrame.ge : Compare DataFrames for greater than inequality
or equality elementwise.
DataFrame.gt : Compare DataFrames for strictly greater than
inequality elementwise.
Notes
-----
Mismatched indices will be unioned together.
`NaN` values are considered different (i.e. `NaN` != `NaN`).
Examples
--------
>>> df = pd.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df
cost revenue
A 250 100
B 150 250
C 100 300
Comparison with a scalar, using either the operator or method:
>>> df == 100
cost revenue
A False True
B False False
C True False
>>> df.eq(100)
cost revenue
A False True
B False False
C True False
When `other` is a :class:`Series`, the columns of a DataFrame are aligned
with the index of `other` and broadcast:
>>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost revenue
A True True
B True False
C False True
Use the method to control the broadcast axis:
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost revenue
A True False
B True True
C True True
D True True
When comparing to an arbitrary sequence, the number of columns must
match the number elements in `other`:
>>> df == [250, 100]
cost revenue
A True True
B False False
C False False
Use the method to control the axis:
>>> df.eq([250, 250, 100], axis='index')
cost revenue
A True False
B False True
C True False
Compare to a DataFrame of different shape.
>>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other
revenue
A 300
B 250
C 100
D 150
>>> df.gt(other)
cost revenue
A False False
B False False
C False True
D False False
Compare to a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
>>> df.le(df_multindex, level=1)
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
- nlargest(self, n: 'int', columns: 'IndexLabel', keep: 'str' = 'first') -> 'DataFrame'
- Return the first `n` rows ordered by `columns` in descending order.
Return the first `n` rows with the largest values in `columns`, in
descending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
``df.sort_values(columns, ascending=False).head(n)``, but more
performant.
Parameters
----------
n : int
Number of rows to return.
columns : label or list of labels
Column label(s) to order by.
keep : {'first', 'last', 'all'}, default 'first'
Where there are duplicate values:
- ``first`` : prioritize the first occurrence(s)
- ``last`` : prioritize the last occurrence(s)
- ``all`` : do not drop any duplicates, even it means
selecting more than `n` items.
Returns
-------
DataFrame
The first `n` rows ordered by the given columns in descending
order.
See Also
--------
DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in
ascending order.
DataFrame.sort_values : Sort DataFrame by the values.
DataFrame.head : Return the first `n` rows without re-ordering.
Notes
-----
This function cannot be used with all column types. For example, when
specifying columns with `object` or `category` dtypes, ``TypeError`` is
raised.
Examples
--------
>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
... 434000, 434000, 337000, 11300,
... 11300, 11300],
... 'GDP': [1937894, 2583560 , 12011, 4520, 12128,
... 17036, 182, 38, 311],
... 'alpha-2': ["IT", "FR", "MT", "MV", "BN",
... "IS", "NR", "TV", "AI"]},
... index=["Italy", "France", "Malta",
... "Maldives", "Brunei", "Iceland",
... "Nauru", "Tuvalu", "Anguilla"])
>>> df
population GDP alpha-2
Italy 59000000 1937894 IT
France 65000000 2583560 FR
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
Iceland 337000 17036 IS
Nauru 11300 182 NR
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
In the following example, we will use ``nlargest`` to select the three
rows having the largest values in column "population".
>>> df.nlargest(3, 'population')
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Malta 434000 12011 MT
When using ``keep='last'``, ties are resolved in reverse order:
>>> df.nlargest(3, 'population', keep='last')
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Brunei 434000 12128 BN
When using ``keep='all'``, all duplicate items are maintained:
>>> df.nlargest(3, 'population', keep='all')
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
To order by the largest values in column "population" and then "GDP",
we can specify multiple columns like in the next example.
>>> df.nlargest(3, ['population', 'GDP'])
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Brunei 434000 12128 BN
- notna(self) -> 'DataFrame'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to False
values.
Returns
-------
DataFrame
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
See Also
--------
DataFrame.notnull : Alias of notna.
DataFrame.isna : Boolean inverse of notna.
DataFrame.dropna : Omit axes labels with missing values.
notna : Top-level notna.
Examples
--------
Show which entries in a DataFrame are not NA.
>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
... born=[pd.NaT, pd.Timestamp('1939-05-27'),
... pd.Timestamp('1940-04-25')],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker']))
>>> df
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
>>> df.notna()
age born name toy
0 True False True False
1 True True True True
2 False True True True
Show which entries in a Series are not NA.
>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0 5.0
1 6.0
2 NaN
dtype: float64
>>> ser.notna()
0 True
1 True
2 False
dtype: bool
- notnull(self) -> 'DataFrame'
- DataFrame.notnull is an alias for DataFrame.notna.
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to False
values.
Returns
-------
DataFrame
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
See Also
--------
DataFrame.notnull : Alias of notna.
DataFrame.isna : Boolean inverse of notna.
DataFrame.dropna : Omit axes labels with missing values.
notna : Top-level notna.
Examples
--------
Show which entries in a DataFrame are not NA.
>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
... born=[pd.NaT, pd.Timestamp('1939-05-27'),
... pd.Timestamp('1940-04-25')],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker']))
>>> df
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
>>> df.notna()
age born name toy
0 True False True False
1 True True True True
2 False True True True
Show which entries in a Series are not NA.
>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0 5.0
1 6.0
2 NaN
dtype: float64
>>> ser.notna()
0 True
1 True
2 False
dtype: bool
- nsmallest(self, n: 'int', columns: 'IndexLabel', keep: 'str' = 'first') -> 'DataFrame'
- Return the first `n` rows ordered by `columns` in ascending order.
Return the first `n` rows with the smallest values in `columns`, in
ascending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
``df.sort_values(columns, ascending=True).head(n)``, but more
performant.
Parameters
----------
n : int
Number of items to retrieve.
columns : list or str
Column name or names to order by.
keep : {'first', 'last', 'all'}, default 'first'
Where there are duplicate values:
- ``first`` : take the first occurrence.
- ``last`` : take the last occurrence.
- ``all`` : do not drop any duplicates, even it means
selecting more than `n` items.
Returns
-------
DataFrame
See Also
--------
DataFrame.nlargest : Return the first `n` rows ordered by `columns` in
descending order.
DataFrame.sort_values : Sort DataFrame by the values.
DataFrame.head : Return the first `n` rows without re-ordering.
Examples
--------
>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
... 434000, 434000, 337000, 337000,
... 11300, 11300],
... 'GDP': [1937894, 2583560 , 12011, 4520, 12128,
... 17036, 182, 38, 311],
... 'alpha-2': ["IT", "FR", "MT", "MV", "BN",
... "IS", "NR", "TV", "AI"]},
... index=["Italy", "France", "Malta",
... "Maldives", "Brunei", "Iceland",
... "Nauru", "Tuvalu", "Anguilla"])
>>> df
population GDP alpha-2
Italy 59000000 1937894 IT
France 65000000 2583560 FR
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
Iceland 337000 17036 IS
Nauru 337000 182 NR
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
In the following example, we will use ``nsmallest`` to select the
three rows having the smallest values in column "population".
>>> df.nsmallest(3, 'population')
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Iceland 337000 17036 IS
When using ``keep='last'``, ties are resolved in reverse order:
>>> df.nsmallest(3, 'population', keep='last')
population GDP alpha-2
Anguilla 11300 311 AI
Tuvalu 11300 38 TV
Nauru 337000 182 NR
When using ``keep='all'``, all duplicate items are maintained:
>>> df.nsmallest(3, 'population', keep='all')
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Iceland 337000 17036 IS
Nauru 337000 182 NR
To order by the smallest values in column "population" and then "GDP", we can
specify multiple columns like in the next example.
>>> df.nsmallest(3, ['population', 'GDP'])
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Nauru 337000 182 NR
- nunique(self, axis: 'Axis' = 0, dropna: 'bool' = True) -> 'Series'
- Count number of distinct elements in specified axis.
Return Series with number of distinct elements. Can ignore NaN
values.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to use. 0 or 'index' for row-wise, 1 or 'columns' for
column-wise.
dropna : bool, default True
Don't include NaN in the counts.
Returns
-------
Series
See Also
--------
Series.nunique: Method nunique for Series.
DataFrame.count: Count non-NA cells for each column or row.
Examples
--------
>>> df = pd.DataFrame({'A': [4, 5, 6], 'B': [4, 1, 1]})
>>> df.nunique()
A 3
B 2
dtype: int64
>>> df.nunique(axis=1)
0 1
1 2
2 2
dtype: int64
- pivot(self, index=None, columns=None, values=None) -> 'DataFrame'
- Return reshaped DataFrame organized by given index / column values.
Reshape data (produce a "pivot" table) based on column values. Uses
unique values from specified `index` / `columns` to form axes of the
resulting DataFrame. This function does not support data
aggregation, multiple values will result in a MultiIndex in the
columns. See the :ref:`User Guide <reshaping>` for more on reshaping.
Parameters
----------
index : str or object or a list of str, optional
Column to use to make new frame's index. If None, uses
existing index.
.. versionchanged:: 1.1.0
Also accept list of index names.
columns : str or object or a list of str
Column to use to make new frame's columns.
.. versionchanged:: 1.1.0
Also accept list of columns names.
values : str, object or a list of the previous, optional
Column(s) to use for populating new frame's values. If not
specified, all remaining columns will be used and the result will
have hierarchically indexed columns.
Returns
-------
DataFrame
Returns reshaped DataFrame.
Raises
------
ValueError:
When there are any `index`, `columns` combinations with multiple
values. `DataFrame.pivot_table` when you need to aggregate.
See Also
--------
DataFrame.pivot_table : Generalization of pivot that can handle
duplicate values for one index/column pair.
DataFrame.unstack : Pivot based on the index values instead of a
column.
wide_to_long : Wide panel to long format. Less flexible but more
user-friendly than melt.
Notes
-----
For finer-tuned control, see hierarchical indexing documentation along
with the related stack/unstack methods.
Reference :ref:`the user guide <reshaping.pivot>` for more examples.
Examples
--------
>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
... 'two'],
... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
... 'baz': [1, 2, 3, 4, 5, 6],
... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t
>>> df.pivot(index='foo', columns='bar', values='baz')
bar A B C
foo
one 1 2 3
two 4 5 6
>>> df.pivot(index='foo', columns='bar')['baz']
bar A B C
foo
one 1 2 3
two 4 5 6
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
baz zoo
bar A B C A B C
foo
one 1 2 3 x y z
two 4 5 6 q w t
You could also assign a list of column names or a list of index names.
>>> df = pd.DataFrame({
... "lev1": [1, 1, 1, 2, 2, 2],
... "lev2": [1, 1, 2, 1, 1, 2],
... "lev3": [1, 2, 1, 2, 1, 2],
... "lev4": [1, 2, 3, 4, 5, 6],
... "values": [0, 1, 2, 3, 4, 5]})
>>> df
lev1 lev2 lev3 lev4 values
0 1 1 1 1 0
1 1 1 2 2 1
2 1 2 1 3 2
3 2 1 2 4 3
4 2 1 1 5 4
5 2 2 2 6 5
>>> df.pivot(index="lev1", columns=["lev2", "lev3"],values="values")
lev2 1 2
lev3 1 2 1 2
lev1
1 0.0 1.0 2.0 NaN
2 4.0 3.0 NaN 5.0
>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values")
lev3 1 2
lev1 lev2
1 1 0.0 1.0
2 2.0 NaN
2 1 4.0 3.0
2 NaN 5.0
A ValueError is raised if there are any duplicates.
>>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
... "bar": ['A', 'A', 'B', 'C'],
... "baz": [1, 2, 3, 4]})
>>> df
foo bar baz
0 one A 1
1 one A 2
2 two B 3
3 two C 4
Notice that the first two rows are the same for our `index`
and `columns` arguments.
>>> df.pivot(index='foo', columns='bar', values='baz')
Traceback (most recent call last):
...
ValueError: Index contains duplicate entries, cannot reshape
- pivot_table(self, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True) -> 'DataFrame'
- Create a spreadsheet-style pivot table as a DataFrame.
The levels in the pivot table will be stored in MultiIndex objects
(hierarchical indexes) on the index and columns of the result DataFrame.
Parameters
----------
values : column to aggregate, optional
index : column, Grouper, array, or list of the previous
If an array is passed, it must be the same length as the data. The
list can contain any of the other types (except list).
Keys to group by on the pivot table index. If an array is passed,
it is being used as the same manner as column values.
columns : column, Grouper, array, or list of the previous
If an array is passed, it must be the same length as the data. The
list can contain any of the other types (except list).
Keys to group by on the pivot table column. If an array is passed,
it is being used as the same manner as column values.
aggfunc : function, list of functions, dict, default numpy.mean
If list of functions passed, the resulting pivot table will have
hierarchical columns whose top level are the function names
(inferred from the function objects themselves)
If dict is passed, the key is column to aggregate and value
is function or list of functions.
fill_value : scalar, default None
Value to replace missing values with (in the resulting pivot table,
after aggregation).
margins : bool, default False
Add all row / columns (e.g. for subtotal / grand totals).
dropna : bool, default True
Do not include columns whose entries are all NaN.
margins_name : str, default 'All'
Name of the row / column that will contain the totals
when margins is True.
observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
.. versionchanged:: 0.25.0
sort : bool, default True
Specifies if the result should be sorted.
.. versionadded:: 1.3.0
Returns
-------
DataFrame
An Excel style pivot table.
See Also
--------
DataFrame.pivot : Pivot without aggregation that can handle
non-numeric data.
DataFrame.melt: Unpivot a DataFrame from wide to long format,
optionally leaving identifiers set.
wide_to_long : Wide panel to long format. Less flexible but more
user-friendly than melt.
Notes
-----
Reference :ref:`the user guide <reshaping.pivot>` for more examples.
Examples
--------
>>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
... "bar", "bar", "bar", "bar"],
... "B": ["one", "one", "one", "two", "two",
... "one", "one", "two", "two"],
... "C": ["small", "large", "large", "small",
... "small", "large", "small", "small",
... "large"],
... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> df
A B C D E
0 foo one small 1 2
1 foo one large 2 4
2 foo one large 2 5
3 foo two small 3 5
4 foo two small 3 6
5 bar one large 4 6
6 bar one small 5 8
7 bar two small 6 9
8 bar two large 7 9
This first example aggregates values by taking the sum.
>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
... columns=['C'], aggfunc=np.sum)
>>> table
C large small
A B
bar one 4.0 5.0
two 7.0 6.0
foo one 4.0 1.0
two NaN 6.0
We can also fill missing values using the `fill_value` parameter.
>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
... columns=['C'], aggfunc=np.sum, fill_value=0)
>>> table
C large small
A B
bar one 4 5
two 7 6
foo one 4 1
two 0 6
The next example aggregates by taking the mean across multiple columns.
>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
... aggfunc={'D': np.mean,
... 'E': np.mean})
>>> table
D E
A C
bar large 5.500000 7.500000
small 5.500000 8.500000
foo large 2.000000 4.500000
small 2.333333 4.333333
We can also calculate multiple types of aggregations for any given
value column.
>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
... aggfunc={'D': np.mean,
... 'E': [min, max, np.mean]})
>>> table
D E
mean max mean min
A C
bar large 5.500000 9 7.500000 6
small 5.500000 9 8.500000 8
foo large 2.000000 5 4.500000 4
small 2.333333 6 4.333333 2
- pop(self, item: 'Hashable') -> 'Series'
- Return item and drop from frame. Raise KeyError if not found.
Parameters
----------
item : label
Label of column to be popped.
Returns
-------
Series
Examples
--------
>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', np.nan)],
... columns=('name', 'class', 'max_speed'))
>>> df
name class max_speed
0 falcon bird 389.0
1 parrot bird 24.0
2 lion mammal 80.5
3 monkey mammal NaN
>>> df.pop('class')
0 bird
1 bird
2 mammal
3 mammal
Name: class, dtype: object
>>> df
name max_speed
0 falcon 389.0
1 parrot 24.0
2 lion 80.5
3 monkey NaN
- pow(self, other, axis='columns', level=None, fill_value=None)
- Get Exponential power of dataframe and other, element-wise (binary operator `pow`).
Equivalent to ``dataframe ** other``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `rpow`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- prod(self, axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
- Return the product of the values over the requested axis.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
min_count : int, default 0
The required number of valid values to perform the operation. If fewer than
``min_count`` non-NA values are present the result will be NA.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
Series or DataFrame (if level specified)
See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.sum : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.
Examples
--------
By default, the product of an empty or all-NA Series is ``1``
>>> pd.Series([], dtype="float64").prod()
1.0
This can be controlled with the ``min_count`` parameter
>>> pd.Series([], dtype="float64").prod(min_count=1)
nan
Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and
empty series identically.
>>> pd.Series([np.nan]).prod()
1.0
>>> pd.Series([np.nan]).prod(min_count=1)
nan
- product = prod(self, axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
- quantile(self, q=0.5, axis: 'Axis' = 0, numeric_only: 'bool' = True, interpolation: 'str' = 'linear')
- Return values at the given quantile over requested axis.
Parameters
----------
q : float or array-like, default 0.5 (50% quantile)
Value between 0 <= q <= 1, the quantile(s) to compute.
axis : {0, 1, 'index', 'columns'}, default 0
Equals 0 or 'index' for row-wise, 1 or 'columns' for column-wise.
numeric_only : bool, default True
If False, the quantile of datetime and timedelta data will be
computed as well.
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
This optional parameter specifies the interpolation method to use,
when the desired quantile lies between two data points `i` and `j`:
* linear: `i + (j - i) * fraction`, where `fraction` is the
fractional part of the index surrounded by `i` and `j`.
* lower: `i`.
* higher: `j`.
* nearest: `i` or `j` whichever is nearest.
* midpoint: (`i` + `j`) / 2.
Returns
-------
Series or DataFrame
If ``q`` is an array, a DataFrame will be returned where the
index is ``q``, the columns are the columns of self, and the
values are the quantiles.
If ``q`` is a float, a Series will be returned where the
index is the columns of self and the values are the quantiles.
See Also
--------
core.window.Rolling.quantile: Rolling quantile.
numpy.percentile: Numpy function to compute the percentile.
Examples
--------
>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
... columns=['a', 'b'])
>>> df.quantile(.1)
a 1.3
b 3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
a b
0.1 1.3 3.7
0.5 2.5 55.0
Specifying `numeric_only=False` will also compute the quantile of
datetime and timedelta data.
>>> df = pd.DataFrame({'A': [1, 2],
... 'B': [pd.Timestamp('2010'),
... pd.Timestamp('2011')],
... 'C': [pd.Timedelta('1 days'),
... pd.Timedelta('2 days')]})
>>> df.quantile(0.5, numeric_only=False)
A 1.5
B 2010-07-02 12:00:00
C 1 days 12:00:00
Name: 0.5, dtype: object
- query(self, expr: 'str', inplace: 'bool' = False, **kwargs)
- Query the columns of a DataFrame with a boolean expression.
Parameters
----------
expr : str
The query string to evaluate.
You can refer to variables
in the environment by prefixing them with an '@' character like
``@a + b``.
You can refer to column names that are not valid Python variable names
by surrounding them in backticks. Thus, column names containing spaces
or punctuations (besides underscores) or starting with digits must be
surrounded by backticks. (For example, a column named "Area (cm^2)" would
be referenced as ```Area (cm^2)```). Column names which are Python keywords
(like "list", "for", "import", etc) cannot be used.
For example, if one of your columns is called ``a a`` and you want
to sum it with ``b``, your query should be ```a a` + b``.
.. versionadded:: 0.25.0
Backtick quoting introduced.
.. versionadded:: 1.0.0
Expanding functionality of backtick quoting for more than only spaces.
inplace : bool
Whether the query should modify the data in place or return
a modified copy.
**kwargs
See the documentation for :func:`eval` for complete details
on the keyword arguments accepted by :meth:`DataFrame.query`.
Returns
-------
DataFrame or None
DataFrame resulting from the provided query expression or
None if ``inplace=True``.
See Also
--------
eval : Evaluate a string describing operations on
DataFrame columns.
DataFrame.eval : Evaluate a string describing operations on
DataFrame columns.
Notes
-----
The result of the evaluation of this expression is first passed to
:attr:`DataFrame.loc` and if that fails because of a
multidimensional key (e.g., a DataFrame) then the result will be passed
to :meth:`DataFrame.__getitem__`.
This method uses the top-level :func:`eval` function to
evaluate the passed query.
The :meth:`~pandas.DataFrame.query` method uses a slightly
modified Python syntax by default. For example, the ``&`` and ``|``
(bitwise) operators have the precedence of their boolean cousins,
:keyword:`and` and :keyword:`or`. This *is* syntactically valid Python,
however the semantics are different.
You can change the semantics of the expression by passing the keyword
argument ``parser='python'``. This enforces the same semantics as
evaluation in Python space. Likewise, you can pass ``engine='python'``
to evaluate an expression using Python itself as a backend. This is not
recommended as it is inefficient compared to using ``numexpr`` as the
engine.
The :attr:`DataFrame.index` and
:attr:`DataFrame.columns` attributes of the
:class:`~pandas.DataFrame` instance are placed in the query namespace
by default, which allows you to treat both the index and columns of the
frame as a column in the frame.
The identifier ``index`` is used for the frame index; you can also
use the name of the index to identify it in a query. Please note that
Python keywords may not be used as identifiers.
For further details and examples see the ``query`` documentation in
:ref:`indexing <indexing.query>`.
*Backtick quoted variables*
Backtick quoted variables are parsed as literal Python code and
are converted internally to a Python valid identifier.
This can lead to the following problems.
During parsing a number of disallowed characters inside the backtick
quoted string are replaced by strings that are allowed as a Python identifier.
These characters include all operators in Python, the space character, the
question mark, the exclamation mark, the dollar sign, and the euro sign.
For other characters that fall outside the ASCII range (U+0001..U+007F)
and those that are not further specified in PEP 3131,
the query parser will raise an error.
This excludes whitespace different than the space character,
but also the hashtag (as it is used for comments) and the backtick
itself (backtick can also not be escaped).
In a special case, quotes that make a pair around a backtick can
confuse the parser.
For example, ```it's` > `that's``` will raise an error,
as it forms a quoted string (``'s > `that'``) with a backtick inside.
See also the Python documentation about lexical analysis
(https://docs.python.org/3/reference/lexical_analysis.html)
in combination with the source code in :mod:`pandas.core.computation.parsing`.
Examples
--------
>>> df = pd.DataFrame({'A': range(1, 6),
... 'B': range(10, 0, -2),
... 'C C': range(10, 5, -1)})
>>> df
A B C C
0 1 10 10
1 2 8 9
2 3 6 8
3 4 4 7
4 5 2 6
>>> df.query('A > B')
A B C C
4 5 2 6
The previous expression is equivalent to
>>> df[df.A > df.B]
A B C C
4 5 2 6
For columns with spaces in their name, you can use backtick quoting.
>>> df.query('B == `C C`')
A B C C
0 1 10 10
The previous expression is equivalent to
>>> df[df.B == df['C C']]
A B C C
0 1 10 10
- radd(self, other, axis='columns', level=None, fill_value=None)
- Get Addition of dataframe and other, element-wise (binary operator `radd`).
Equivalent to ``other + dataframe``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `add`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- rdiv = rtruediv(self, other, axis='columns', level=None, fill_value=None)
- reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)
- Conform Series/DataFrame to new index with optional filling logic.
Places NA/NaN in locations having no value in the previous index. A new object
is produced unless the new index is equivalent to the current one and
``copy=False``.
Parameters
----------
keywords for axes : array-like, optional
New labels / index to conform to, should be specified using
keywords. Preferably an Index object to avoid duplicating data.
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* None (default): don't fill gaps
* pad / ffill: Propagate last valid observation forward to next
valid.
* backfill / bfill: Use next valid observation to fill gap.
* nearest: Use nearest valid observations to fill gap.
copy : bool, default True
Return a new object, even if the passed indexes are the same.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any
"compatible" value.
limit : int, default None
Maximum number of consecutive elements to forward or backward fill.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
Series/DataFrame with changed index.
See Also
--------
DataFrame.set_index : Set row labels.
DataFrame.reset_index : Remove row labels or move them to new columns.
DataFrame.reindex_like : Change to same indices as other DataFrame.
Examples
--------
``DataFrame.reindex`` supports two calling conventions
* ``(index=index_labels, columns=column_labels, ...)``
* ``(labels, axis={'index', 'columns'}, ...)``
We *highly* recommend using keyword arguments to clarify your
intent.
Create a dataframe with some fictional data.
>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
... index=index)
>>> df
http_status response_time
Firefox 200 0.04
Chrome 200 0.02
Safari 404 0.07
IE10 404 0.08
Konqueror 301 1.00
Create a new index and reindex the dataframe. By default
values in the new index that do not have corresponding
records in the dataframe are assigned ``NaN``.
>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
... 'Chrome']
>>> df.reindex(new_index)
http_status response_time
Safari 404.0 0.07
Iceweasel NaN NaN
Comodo Dragon NaN NaN
IE10 404.0 0.08
Chrome 200.0 0.02
We can fill in the missing values by passing a value to
the keyword ``fill_value``. Because the index is not monotonically
increasing or decreasing, we cannot use arguments to the keyword
``method`` to fill the ``NaN`` values.
>>> df.reindex(new_index, fill_value=0)
http_status response_time
Safari 404 0.07
Iceweasel 0 0.00
Comodo Dragon 0 0.00
IE10 404 0.08
Chrome 200 0.02
>>> df.reindex(new_index, fill_value='missing')
http_status response_time
Safari 404 0.07
Iceweasel missing missing
Comodo Dragon missing missing
IE10 404 0.08
Chrome 200 0.02
We can also reindex the columns.
>>> df.reindex(columns=['http_status', 'user_agent'])
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN
Or we can use "axis-style" keyword arguments
>>> df.reindex(['http_status', 'user_agent'], axis="columns")
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN
To further illustrate the filling functionality in
``reindex``, we will create a dataframe with a
monotonically increasing index (for example, a sequence
of dates).
>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
... index=date_index)
>>> df2
prices
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
Suppose we decide to expand the dataframe to cover a wider
date range.
>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
prices
2009-12-29 NaN
2009-12-30 NaN
2009-12-31 NaN
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN
The index entries that did not have a value in the original data frame
(for example, '2009-12-29') are by default filled with ``NaN``.
If desired, we can fill in the missing values using one of several
options.
For example, to back-propagate the last valid value to fill the ``NaN``
values, pass ``bfill`` as an argument to the ``method`` keyword.
>>> df2.reindex(date_index2, method='bfill')
prices
2009-12-29 100.0
2009-12-30 100.0
2009-12-31 100.0
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN
Please note that the ``NaN`` value present in the original dataframe
(at index value 2010-01-03) will not be filled by any of the
value propagation schemes. This is because filling while reindexing
does not look at dataframe values, but only compares the original and
desired indexes. If you do want to fill in the ``NaN`` values present
in the original dataframe, use the ``fillna()`` method.
See the :ref:`user guide <basics.reindexing>` for more.
- rename(self, mapper: 'Renamer | None' = None, *, index: 'Renamer | None' = None, columns: 'Renamer | None' = None, axis: 'Axis | None' = None, copy: 'bool' = True, inplace: 'bool' = False, level: 'Level | None' = None, errors: 'str' = 'ignore') -> 'DataFrame | None'
- Alter axes labels.
Function / dict values must be unique (1-to-1). Labels not contained in
a dict / Series will be left as-is. Extra labels listed don't throw an
error.
See the :ref:`user guide <basics.rename>` for more.
Parameters
----------
mapper : dict-like or function
Dict-like or function transformations to apply to
that axis' values. Use either ``mapper`` and ``axis`` to
specify the axis to target with ``mapper``, or ``index`` and
``columns``.
index : dict-like or function
Alternative to specifying axis (``mapper, axis=0``
is equivalent to ``index=mapper``).
columns : dict-like or function
Alternative to specifying axis (``mapper, axis=1``
is equivalent to ``columns=mapper``).
axis : {0 or 'index', 1 or 'columns'}, default 0
Axis to target with ``mapper``. Can be either the axis name
('index', 'columns') or number (0, 1). The default is 'index'.
copy : bool, default True
Also copy underlying data.
inplace : bool, default False
Whether to return a new DataFrame. If True then value of copy is
ignored.
level : int or level name, default None
In case of a MultiIndex, only rename labels in the specified
level.
errors : {'ignore', 'raise'}, default 'ignore'
If 'raise', raise a `KeyError` when a dict-like `mapper`, `index`,
or `columns` contains labels that are not present in the Index
being transformed.
If 'ignore', existing keys will be renamed and extra keys will be
ignored.
Returns
-------
DataFrame or None
DataFrame with the renamed axis labels or None if ``inplace=True``.
Raises
------
KeyError
If any of the labels is not found in the selected axis and
"errors='raise'".
See Also
--------
DataFrame.rename_axis : Set the name of the axis.
Examples
--------
``DataFrame.rename`` supports two calling conventions
* ``(index=index_mapper, columns=columns_mapper, ...)``
* ``(mapper, axis={'index', 'columns'}, ...)``
We *highly* recommend using keyword arguments to clarify your
intent.
Rename columns using a mapping:
>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"})
a c
0 1 4
1 2 5
2 3 6
Rename index using a mapping:
>>> df.rename(index={0: "x", 1: "y", 2: "z"})
A B
x 1 4
y 2 5
z 3 6
Cast index labels to a different type:
>>> df.index
RangeIndex(start=0, stop=3, step=1)
>>> df.rename(index=str).index
Index(['0', '1', '2'], dtype='object')
>>> df.rename(columns={"A": "a", "B": "b", "C": "c"}, errors="raise")
Traceback (most recent call last):
KeyError: ['C'] not found in axis
Using axis-style parameters:
>>> df.rename(str.lower, axis='columns')
a b
0 1 4
1 2 5
2 3 6
>>> df.rename({1: 2, 2: 4}, axis='index')
A B
0 1 4
2 2 5
4 3 6
- reorder_levels(self, order: 'Sequence[Axis]', axis: 'Axis' = 0) -> 'DataFrame'
- Rearrange index levels using input order. May not drop or duplicate levels.
Parameters
----------
order : list of int or list of str
List representing new level order. Reference level by number
(position) or by key (label).
axis : {0 or 'index', 1 or 'columns'}, default 0
Where to reorder levels.
Returns
-------
DataFrame
Examples
--------
>>> data = {
... "class": ["Mammals", "Mammals", "Reptiles"],
... "diet": ["Omnivore", "Carnivore", "Carnivore"],
... "species": ["Humans", "Dogs", "Snakes"],
... }
>>> df = pd.DataFrame(data, columns=["class", "diet", "species"])
>>> df = df.set_index(["class", "diet"])
>>> df
species
class diet
Mammals Omnivore Humans
Carnivore Dogs
Reptiles Carnivore Snakes
Let's reorder the levels of the index:
>>> df.reorder_levels(["diet", "class"])
species
diet class
Omnivore Mammals Humans
Carnivore Mammals Dogs
Reptiles Snakes
- replace(self, to_replace=None, value=<no_default>, inplace: 'bool' = False, limit=None, regex: 'bool' = False, method: 'str | lib.NoDefault' = <no_default>)
- Replace values given in `to_replace` with `value`.
Values of the DataFrame are replaced with other values dynamically.
This differs from updating with ``.loc`` or ``.iloc``, which require
you to specify a location to update with some value.
Parameters
----------
to_replace : str, regex, list, dict, Series, int, float, or None
How to find the values that will be replaced.
* numeric, str or regex:
- numeric: numeric values equal to `to_replace` will be
replaced with `value`
- str: string exactly matching `to_replace` will be replaced
with `value`
- regex: regexs matching `to_replace` will be replaced with
`value`
* list of str, regex, or numeric:
- First, if `to_replace` and `value` are both lists, they
**must** be the same length.
- Second, if ``regex=True`` then all of the strings in **both**
lists will be interpreted as regexs otherwise they will match
directly. This doesn't matter much for `value` since there
are only a few possible substitution regexes you can use.
- str, regex and numeric rules apply as above.
* dict:
- Dicts can be used to specify different replacement values
for different existing values. For example,
``{'a': 'b', 'y': 'z'}`` replaces the value 'a' with 'b' and
'y' with 'z'. To use a dict in this way the `value`
parameter should be `None`.
- For a DataFrame a dict can specify that different values
should be replaced in different columns. For example,
``{'a': 1, 'b': 'z'}`` looks for the value 1 in column 'a'
and the value 'z' in column 'b' and replaces these values
with whatever is specified in `value`. The `value` parameter
should not be ``None`` in this case. You can treat this as a
special case of passing two lists except that you are
specifying the column to search in.
- For a DataFrame nested dictionaries, e.g.,
``{'a': {'b': np.nan}}``, are read as follows: look in column
'a' for the value 'b' and replace it with NaN. The `value`
parameter should be ``None`` to use a nested dict in this
way. You can nest regular expressions as well. Note that
column names (the top-level dictionary keys in a nested
dictionary) **cannot** be regular expressions.
* None:
- This means that the `regex` argument must be a string,
compiled regular expression, or list, dict, ndarray or
Series of such elements. If `value` is also ``None`` then
this **must** be a nested dictionary or Series.
See the examples section for examples of each of these.
value : scalar, dict, list, str, regex, default None
Value to replace any values matching `to_replace` with.
For a DataFrame a dict of values can be used to specify which
value to use for each column (columns not in the dict will not be
filled). Regular expressions, strings and lists or dicts of such
objects are also allowed.
inplace : bool, default False
If True, performs operation inplace and returns None.
limit : int, default None
Maximum size gap to forward or backward fill.
regex : bool or same types as `to_replace`, default False
Whether to interpret `to_replace` and/or `value` as regular
expressions. If this is ``True`` then `to_replace` *must* be a
string. Alternatively, this could be a regular expression or a
list, dict, or array of regular expressions in which case
`to_replace` must be ``None``.
method : {'pad', 'ffill', 'bfill', `None`}
The method to use when for replacement, when `to_replace` is a
scalar, list or tuple and `value` is ``None``.
.. versionchanged:: 0.23.0
Added to DataFrame.
Returns
-------
DataFrame
Object after replacement.
Raises
------
AssertionError
* If `regex` is not a ``bool`` and `to_replace` is not
``None``.
TypeError
* If `to_replace` is not a scalar, array-like, ``dict``, or ``None``
* If `to_replace` is a ``dict`` and `value` is not a ``list``,
``dict``, ``ndarray``, or ``Series``
* If `to_replace` is ``None`` and `regex` is not compilable
into a regular expression or is a list, dict, ndarray, or
Series.
* When replacing multiple ``bool`` or ``datetime64`` objects and
the arguments to `to_replace` does not match the type of the
value being replaced
ValueError
* If a ``list`` or an ``ndarray`` is passed to `to_replace` and
`value` but they are not the same length.
See Also
--------
DataFrame.fillna : Fill NA values.
DataFrame.where : Replace values based on boolean condition.
Series.str.replace : Simple string replacement.
Notes
-----
* Regex substitution is performed under the hood with ``re.sub``. The
rules for substitution for ``re.sub`` are the same.
* Regular expressions will only substitute on strings, meaning you
cannot provide, for example, a regular expression matching floating
point numbers and expect the columns in your frame that have a
numeric dtype to be matched. However, if those floating point
numbers *are* strings, then you can do this.
* This method has *a lot* of options. You are encouraged to experiment
and play with this method to gain intuition about how it works.
* When dict is used as the `to_replace` value, it is like
key(s) in the dict are the to_replace part and
value(s) in the dict are the value parameter.
Examples
--------
**Scalar `to_replace` and `value`**
>>> s = pd.Series([1, 2, 3, 4, 5])
>>> s.replace(1, 5)
0 5
1 2
2 3
3 4
4 5
dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
... 'B': [5, 6, 7, 8, 9],
... 'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)
A B C
0 5 5 a
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 e
**List-like `to_replace`**
>>> df.replace([0, 1, 2, 3], 4)
A B C
0 4 5 a
1 4 6 b
2 4 7 c
3 4 8 d
4 4 9 e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
A B C
0 4 5 a
1 3 6 b
2 2 7 c
3 1 8 d
4 4 9 e
>>> s.replace([1, 2], method='bfill')
0 3
1 3
2 3
3 4
4 5
dtype: int64
**dict-like `to_replace`**
>>> df.replace({0: 10, 1: 100})
A B C
0 10 5 a
1 100 6 b
2 2 7 c
3 3 8 d
4 4 9 e
>>> df.replace({'A': 0, 'B': 5}, 100)
A B C
0 100 100 a
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 e
>>> df.replace({'A': {0: 100, 4: 400}})
A B C
0 100 5 a
1 1 6 b
2 2 7 c
3 3 8 d
4 400 9 e
**Regular expression `to_replace`**
>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
... 'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
A B
0 new abc
1 foo new
2 bait xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
A B
0 new abc
1 foo bar
2 bait xyz
>>> df.replace(regex=r'^ba.$', value='new')
A B
0 new abc
1 foo new
2 bait xyz
>>> df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})
A B
0 new abc
1 xyz new
2 bait xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
A B
0 new abc
1 new new
2 bait xyz
Compare the behavior of ``s.replace({'a': None})`` and
``s.replace('a', None)`` to understand the peculiarities
of the `to_replace` parameter:
>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
When one uses a dict as the `to_replace` value, it is like the
value(s) in the dict are equal to the `value` parameter.
``s.replace({'a': None})`` is equivalent to
``s.replace(to_replace={'a': None}, value=None, method=None)``:
>>> s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object
When ``value`` is not explicitly passed and `to_replace` is a scalar, list
or tuple, `replace` uses the method parameter (default 'pad') to do the
replacement. So this is why the 'a' values are being replaced by 10
in rows 1 and 2 and 'b' in row 4 in this case.
>>> s.replace('a')
0 10
1 10
2 10
3 b
4 b
dtype: object
On the other hand, if ``None`` is explicitly passed for ``value``, it will
be respected:
>>> s.replace('a', None)
0 10
1 None
2 None
3 b
4 None
dtype: object
.. versionchanged:: 1.4.0
Previously the explicit ``None`` was silently ignored.
- resample(self, rule, axis=0, closed: 'str | None' = None, label: 'str | None' = None, convention: 'str' = 'start', kind: 'str | None' = None, loffset=None, base: 'int | None' = None, on=None, level=None, origin: 'str | TimestampConvertibleTypes' = 'start_day', offset: 'TimedeltaConvertibleTypes | None' = None) -> 'Resampler'
- Resample time-series data.
Convenience method for frequency conversion and resampling of time series.
The object must have a datetime-like index (`DatetimeIndex`, `PeriodIndex`,
or `TimedeltaIndex`), or the caller must pass the label of a datetime-like
series/index to the ``on``/``level`` keyword parameter.
Parameters
----------
rule : DateOffset, Timedelta or str
The offset string or object representing target conversion.
axis : {0 or 'index', 1 or 'columns'}, default 0
Which axis to use for up- or down-sampling. For `Series` this
will default to 0, i.e. along the rows. Must be
`DatetimeIndex`, `TimedeltaIndex` or `PeriodIndex`.
closed : {'right', 'left'}, default None
Which side of bin interval is closed. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.
label : {'right', 'left'}, default None
Which bin edge label to label bucket with. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.
convention : {'start', 'end', 's', 'e'}, default 'start'
For `PeriodIndex` only, controls whether to use the start or
end of `rule`.
kind : {'timestamp', 'period'}, optional, default None
Pass 'timestamp' to convert the resulting index to a
`DateTimeIndex` or 'period' to convert it to a `PeriodIndex`.
By default the input representation is retained.
loffset : timedelta, default None
Adjust the resampled time labels.
.. deprecated:: 1.1.0
You should add the loffset to the `df.index` after the resample.
See below.
base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0.
.. deprecated:: 1.1.0
The new arguments that you should use are 'offset' or 'origin'.
on : str, optional
For a DataFrame, column to use instead of index for resampling.
Column must be datetime-like.
level : str or int, optional
For a MultiIndex, level (name or number) to use for
resampling. `level` must be datetime-like.
origin : Timestamp or str, default 'start_day'
The timestamp on which to adjust the grouping. The timezone of origin
must match the timezone of the index.
If string, must be one of the following:
- 'epoch': `origin` is 1970-01-01
- 'start': `origin` is the first value of the timeseries
- 'start_day': `origin` is the first day at midnight of the timeseries
.. versionadded:: 1.1.0
- 'end': `origin` is the last value of the timeseries
- 'end_day': `origin` is the ceiling midnight of the last day
.. versionadded:: 1.3.0
offset : Timedelta or str, default is None
An offset timedelta added to the origin.
.. versionadded:: 1.1.0
Returns
-------
pandas.core.Resampler
:class:`~pandas.core.Resampler` object.
See Also
--------
Series.resample : Resample a Series.
DataFrame.resample : Resample a DataFrame.
groupby : Group DataFrame by mapping, function, label, or list of labels.
asfreq : Reindex a DataFrame with the given frequency without grouping.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling>`__
for more.
To learn more about the offset strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects>`__.
Examples
--------
Start by creating a series with 9 one minute timestamps.
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
2000-01-01 00:08:00 8
Freq: T, dtype: int64
Downsample the series into 3 minute bins and sum the values
of the timestamps falling into a bin.
>>> series.resample('3T').sum()
2000-01-01 00:00:00 3
2000-01-01 00:03:00 12
2000-01-01 00:06:00 21
Freq: 3T, dtype: int64
Downsample the series into 3 minute bins as above, but label each
bin using the right edge instead of the left. Please note that the
value in the bucket used as the label is not included in the bucket,
which it labels. For example, in the original series the
bucket ``2000-01-01 00:03:00`` contains the value 3, but the summed
value in the resampled bucket with the label ``2000-01-01 00:03:00``
does not include 3 (if it did, the summed value would be 6, not 3).
To include this value close the right side of the bin interval as
illustrated in the example below this one.
>>> series.resample('3T', label='right').sum()
2000-01-01 00:03:00 3
2000-01-01 00:06:00 12
2000-01-01 00:09:00 21
Freq: 3T, dtype: int64
Downsample the series into 3 minute bins as above, but close the right
side of the bin interval.
>>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00 0
2000-01-01 00:03:00 6
2000-01-01 00:06:00 15
2000-01-01 00:09:00 15
Freq: 3T, dtype: int64
Upsample the series into 30 second bins.
>>> series.resample('30S').asfreq()[0:5] # Select first 5 rows
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 NaN
2000-01-01 00:01:00 1.0
2000-01-01 00:01:30 NaN
2000-01-01 00:02:00 2.0
Freq: 30S, dtype: float64
Upsample the series into 30 second bins and fill the ``NaN``
values using the ``pad`` method.
>>> series.resample('30S').pad()[0:5]
2000-01-01 00:00:00 0
2000-01-01 00:00:30 0
2000-01-01 00:01:00 1
2000-01-01 00:01:30 1
2000-01-01 00:02:00 2
Freq: 30S, dtype: int64
Upsample the series into 30 second bins and fill the
``NaN`` values using the ``bfill`` method.
>>> series.resample('30S').bfill()[0:5]
2000-01-01 00:00:00 0
2000-01-01 00:00:30 1
2000-01-01 00:01:00 1
2000-01-01 00:01:30 2
2000-01-01 00:02:00 2
Freq: 30S, dtype: int64
Pass a custom function via ``apply``
>>> def custom_resampler(arraylike):
... return np.sum(arraylike) + 5
...
>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00 8
2000-01-01 00:03:00 17
2000-01-01 00:06:00 26
Freq: 3T, dtype: int64
For a Series with a PeriodIndex, the keyword `convention` can be
used to control whether to use the start or end of `rule`.
Resample a year by quarter using 'start' `convention`. Values are
assigned to the first quarter of the period.
>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
... freq='A',
... periods=2))
>>> s
2012 1
2013 2
Freq: A-DEC, dtype: int64
>>> s.resample('Q', convention='start').asfreq()
2012Q1 1.0
2012Q2 NaN
2012Q3 NaN
2012Q4 NaN
2013Q1 2.0
2013Q2 NaN
2013Q3 NaN
2013Q4 NaN
Freq: Q-DEC, dtype: float64
Resample quarters by month using 'end' `convention`. Values are
assigned to the last month of the period.
>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',
... freq='Q',
... periods=4))
>>> q
2018Q1 1
2018Q2 2
2018Q3 3
2018Q4 4
Freq: Q-DEC, dtype: int64
>>> q.resample('M', convention='end').asfreq()
2018-03 1.0
2018-04 NaN
2018-05 NaN
2018-06 2.0
2018-07 NaN
2018-08 NaN
2018-09 3.0
2018-10 NaN
2018-11 NaN
2018-12 4.0
Freq: M, dtype: float64
For DataFrame objects, the keyword `on` can be used to specify the
column instead of the index for resampling.
>>> d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df = pd.DataFrame(d)
>>> df['week_starting'] = pd.date_range('01/01/2018',
... periods=8,
... freq='W')
>>> df
price volume week_starting
0 10 50 2018-01-07
1 11 60 2018-01-14
2 9 40 2018-01-21
3 13 100 2018-01-28
4 14 50 2018-02-04
5 18 100 2018-02-11
6 17 40 2018-02-18
7 19 50 2018-02-25
>>> df.resample('M', on='week_starting').mean()
price volume
week_starting
2018-01-31 10.75 62.5
2018-02-28 17.00 60.0
For a DataFrame with MultiIndex, the keyword `level` can be used to
specify on which level the resampling needs to take place.
>>> days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df2 = pd.DataFrame(
... d2,
... index=pd.MultiIndex.from_product(
... [days, ['morning', 'afternoon']]
... )
... )
>>> df2
price volume
2000-01-01 morning 10 50
afternoon 11 60
2000-01-02 morning 9 40
afternoon 13 100
2000-01-03 morning 14 50
afternoon 18 100
2000-01-04 morning 17 40
afternoon 19 50
>>> df2.resample('D', level=0).sum()
price volume
2000-01-01 21 110
2000-01-02 22 140
2000-01-03 32 150
2000-01-04 36 90
If you want to adjust the start of the bins based on a fixed timestamp:
>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00 0
2000-10-01 23:37:00 3
2000-10-01 23:44:00 6
2000-10-01 23:51:00 9
2000-10-01 23:58:00 12
2000-10-02 00:05:00 15
2000-10-02 00:12:00 18
2000-10-02 00:19:00 21
2000-10-02 00:26:00 24
Freq: 7T, dtype: int64
>>> ts.resample('17min').sum()
2000-10-01 23:14:00 0
2000-10-01 23:31:00 9
2000-10-01 23:48:00 21
2000-10-02 00:05:00 54
2000-10-02 00:22:00 24
Freq: 17T, dtype: int64
>>> ts.resample('17min', origin='epoch').sum()
2000-10-01 23:18:00 0
2000-10-01 23:35:00 18
2000-10-01 23:52:00 27
2000-10-02 00:09:00 39
2000-10-02 00:26:00 24
Freq: 17T, dtype: int64
>>> ts.resample('17min', origin='2000-01-01').sum()
2000-10-01 23:24:00 3
2000-10-01 23:41:00 15
2000-10-01 23:58:00 45
2000-10-02 00:15:00 45
Freq: 17T, dtype: int64
If you want to adjust the start of the bins with an `offset` Timedelta, the two
following lines are equivalent:
>>> ts.resample('17min', origin='start').sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
>>> ts.resample('17min', offset='23h30min').sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
If you want to take the largest Timestamp as the end of the bins:
>>> ts.resample('17min', origin='end').sum()
2000-10-01 23:35:00 0
2000-10-01 23:52:00 18
2000-10-02 00:09:00 27
2000-10-02 00:26:00 63
Freq: 17T, dtype: int64
In contrast with the `start_day`, you can use `end_day` to take the ceiling
midnight of the largest Timestamp as the end of the bins and drop the bins
not containing data:
>>> ts.resample('17min', origin='end_day').sum()
2000-10-01 23:38:00 3
2000-10-01 23:55:00 15
2000-10-02 00:12:00 45
2000-10-02 00:29:00 45
Freq: 17T, dtype: int64
To replace the use of the deprecated `base` argument, you can now use `offset`,
in this example it is equivalent to have `base=2`:
>>> ts.resample('17min', offset='2min').sum()
2000-10-01 23:16:00 0
2000-10-01 23:33:00 9
2000-10-01 23:50:00 36
2000-10-02 00:07:00 39
2000-10-02 00:24:00 24
Freq: 17T, dtype: int64
To replace the use of the deprecated `loffset` argument:
>>> from pandas.tseries.frequencies import to_offset
>>> loffset = '19min'
>>> ts_out = ts.resample('17min').sum()
>>> ts_out.index = ts_out.index + to_offset(loffset)
>>> ts_out
2000-10-01 23:33:00 0
2000-10-01 23:50:00 9
2000-10-02 00:07:00 21
2000-10-02 00:24:00 54
2000-10-02 00:41:00 24
Freq: 17T, dtype: int64
- reset_index(self, level: 'Hashable | Sequence[Hashable] | None' = None, drop: 'bool' = False, inplace: 'bool' = False, col_level: 'Hashable' = 0, col_fill: 'Hashable' = '') -> 'DataFrame | None'
- Reset the index, or a level of it.
Reset the index of the DataFrame, and use the default one instead.
If the DataFrame has a MultiIndex, this method can remove one or more
levels.
Parameters
----------
level : int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by
default.
drop : bool, default False
Do not try to insert index into dataframe columns. This resets
the index to the default integer index.
inplace : bool, default False
Modify the DataFrame in place (do not create a new object).
col_level : int or str, default 0
If the columns have multiple levels, determines which level the
labels are inserted into. By default it is inserted into the first
level.
col_fill : object, default ''
If the columns have multiple levels, determines how the other
levels are named. If None then the index name is repeated.
Returns
-------
DataFrame or None
DataFrame with the new index or None if ``inplace=True``.
See Also
--------
DataFrame.set_index : Opposite of reset_index.
DataFrame.reindex : Change to new indices or expand indices.
DataFrame.reindex_like : Change to same indices as other DataFrame.
Examples
--------
>>> df = pd.DataFrame([('bird', 389.0),
... ('bird', 24.0),
... ('mammal', 80.5),
... ('mammal', np.nan)],
... index=['falcon', 'parrot', 'lion', 'monkey'],
... columns=('class', 'max_speed'))
>>> df
class max_speed
falcon bird 389.0
parrot bird 24.0
lion mammal 80.5
monkey mammal NaN
When we reset the index, the old index is added as a column, and a
new sequential index is used:
>>> df.reset_index()
index class max_speed
0 falcon bird 389.0
1 parrot bird 24.0
2 lion mammal 80.5
3 monkey mammal NaN
We can use the `drop` parameter to avoid the old index being added as
a column:
>>> df.reset_index(drop=True)
class max_speed
0 bird 389.0
1 bird 24.0
2 mammal 80.5
3 mammal NaN
You can also use `reset_index` with `MultiIndex`.
>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
... ('bird', 'parrot'),
... ('mammal', 'lion'),
... ('mammal', 'monkey')],
... names=['class', 'name'])
>>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),
... ('species', 'type')])
>>> df = pd.DataFrame([(389.0, 'fly'),
... ( 24.0, 'fly'),
... ( 80.5, 'run'),
... (np.nan, 'jump')],
... index=index,
... columns=columns)
>>> df
speed species
max type
class name
bird falcon 389.0 fly
parrot 24.0 fly
mammal lion 80.5 run
monkey NaN jump
If the index has multiple levels, we can reset a subset of them:
>>> df.reset_index(level='class')
class speed species
max type
name
falcon bird 389.0 fly
parrot bird 24.0 fly
lion mammal 80.5 run
monkey mammal NaN jump
If we are not dropping the index, by default, it is placed in the top
level. We can place it in another level:
>>> df.reset_index(level='class', col_level=1)
speed species
class max type
name
falcon bird 389.0 fly
parrot bird 24.0 fly
lion mammal 80.5 run
monkey mammal NaN jump
When the index is inserted under another level, we can specify under
which one with the parameter `col_fill`:
>>> df.reset_index(level='class', col_level=1, col_fill='species')
species speed species
class max type
name
falcon bird 389.0 fly
parrot bird 24.0 fly
lion mammal 80.5 run
monkey mammal NaN jump
If we specify a nonexistent level for `col_fill`, it is created:
>>> df.reset_index(level='class', col_level=1, col_fill='genus')
genus speed species
class max type
name
falcon bird 389.0 fly
parrot bird 24.0 fly
lion mammal 80.5 run
monkey mammal NaN jump
- rfloordiv(self, other, axis='columns', level=None, fill_value=None)
- Get Integer division of dataframe and other, element-wise (binary operator `rfloordiv`).
Equivalent to ``other // dataframe``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `floordiv`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- rmod(self, other, axis='columns', level=None, fill_value=None)
- Get Modulo of dataframe and other, element-wise (binary operator `rmod`).
Equivalent to ``other % dataframe``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `mod`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- rmul(self, other, axis='columns', level=None, fill_value=None)
- Get Multiplication of dataframe and other, element-wise (binary operator `rmul`).
Equivalent to ``other * dataframe``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `mul`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- round(self, decimals: 'int | dict[IndexLabel, int] | Series' = 0, *args, **kwargs) -> 'DataFrame'
- Round a DataFrame to a variable number of decimal places.
Parameters
----------
decimals : int, dict, Series
Number of decimal places to round each column to. If an int is
given, round each column to the same number of places.
Otherwise dict and Series round to variable numbers of places.
Column names should be in the keys if `decimals` is a
dict-like, or in the index if `decimals` is a Series. Any
columns not included in `decimals` will be left as is. Elements
of `decimals` which are not columns of the input will be
ignored.
*args
Additional keywords have no effect but might be accepted for
compatibility with numpy.
**kwargs
Additional keywords have no effect but might be accepted for
compatibility with numpy.
Returns
-------
DataFrame
A DataFrame with the affected columns rounded to the specified
number of decimal places.
See Also
--------
numpy.around : Round a numpy array to the given number of decimals.
Series.round : Round a Series to the given number of decimals.
Examples
--------
>>> df = pd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
... columns=['dogs', 'cats'])
>>> df
dogs cats
0 0.21 0.32
1 0.01 0.67
2 0.66 0.03
3 0.21 0.18
By providing an integer each column is rounded to the same number
of decimal places
>>> df.round(1)
dogs cats
0 0.2 0.3
1 0.0 0.7
2 0.7 0.0
3 0.2 0.2
With a dict, the number of places for specific columns can be
specified with the column names as key and the number of decimal
places as value
>>> df.round({'dogs': 1, 'cats': 0})
dogs cats
0 0.2 0.0
1 0.0 1.0
2 0.7 0.0
3 0.2 0.0
Using a Series, the number of places for specific columns can be
specified with the column names as index and the number of
decimal places as value
>>> decimals = pd.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
dogs cats
0 0.2 0.0
1 0.0 1.0
2 0.7 0.0
3 0.2 0.0
- rpow(self, other, axis='columns', level=None, fill_value=None)
- Get Exponential power of dataframe and other, element-wise (binary operator `rpow`).
Equivalent to ``other ** dataframe``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `pow`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- rsub(self, other, axis='columns', level=None, fill_value=None)
- Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).
Equivalent to ``other - dataframe``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `sub`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- rtruediv(self, other, axis='columns', level=None, fill_value=None)
- Get Floating division of dataframe and other, element-wise (binary operator `rtruediv`).
Equivalent to ``other / dataframe``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `truediv`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- select_dtypes(self, include=None, exclude=None) -> 'DataFrame'
- Return a subset of the DataFrame's columns based on the column dtypes.
Parameters
----------
include, exclude : scalar or list-like
A selection of dtypes or strings to be included/excluded. At least
one of these parameters must be supplied.
Returns
-------
DataFrame
The subset of the frame including the dtypes in ``include`` and
excluding the dtypes in ``exclude``.
Raises
------
ValueError
* If both of ``include`` and ``exclude`` are empty
* If ``include`` and ``exclude`` have overlapping elements
* If any kind of string dtype is passed in.
See Also
--------
DataFrame.dtypes: Return Series with the data type of each column.
Notes
-----
* To select all *numeric* types, use ``np.number`` or ``'number'``
* To select strings you must use the ``object`` dtype, but note that
this will return *all* object dtype columns
* See the `numpy dtype hierarchy
<https://numpy.org/doc/stable/reference/arrays.scalars.html>`__
* To select datetimes, use ``np.datetime64``, ``'datetime'`` or
``'datetime64'``
* To select timedeltas, use ``np.timedelta64``, ``'timedelta'`` or
``'timedelta64'``
* To select Pandas categorical dtypes, use ``'category'``
* To select Pandas datetimetz dtypes, use ``'datetimetz'`` (new in
0.20.0) or ``'datetime64[ns, tz]'``
Examples
--------
>>> df = pd.DataFrame({'a': [1, 2] * 3,
... 'b': [True, False] * 3,
... 'c': [1.0, 2.0] * 3})
>>> df
a b c
0 1 True 1.0
1 2 False 2.0
2 1 True 1.0
3 2 False 2.0
4 1 True 1.0
5 2 False 2.0
>>> df.select_dtypes(include='bool')
b
0 True
1 False
2 True
3 False
4 True
5 False
>>> df.select_dtypes(include=['float64'])
c
0 1.0
1 2.0
2 1.0
3 2.0
4 1.0
5 2.0
>>> df.select_dtypes(exclude=['int64'])
b c
0 True 1.0
1 False 2.0
2 True 1.0
3 False 2.0
4 True 1.0
5 False 2.0
- sem(self, axis=None, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)
- Return unbiased standard error of the mean over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
Parameters
----------
axis : {index (0), columns (1)}
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Returns
-------
Series or DataFrame (if level specified)
- set_axis(self, labels, axis: 'Axis' = 0, inplace: 'bool' = False)
- Assign desired index to given axis.
Indexes for column or row labels can be changed by assigning
a list-like or Index.
Parameters
----------
labels : list-like, Index
The values for the new index.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to update. The value 0 identifies the rows, and 1 identifies the columns.
inplace : bool, default False
Whether to return a new DataFrame instance.
Returns
-------
renamed : DataFrame or None
An object of type DataFrame or None if ``inplace=True``.
See Also
--------
DataFrame.rename_axis : Alter the name of the index or columns.
Examples
--------
>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
Change the row labels.
>>> df.set_axis(['a', 'b', 'c'], axis='index')
A B
a 1 4
b 2 5
c 3 6
Change the column labels.
>>> df.set_axis(['I', 'II'], axis='columns')
I II
0 1 4
1 2 5
2 3 6
Now, update the labels inplace.
>>> df.set_axis(['i', 'ii'], axis='columns', inplace=True)
>>> df
i ii
0 1 4
1 2 5
2 3 6
- set_index(self, keys, drop: 'bool' = True, append: 'bool' = False, inplace: 'bool' = False, verify_integrity: 'bool' = False)
- Set the DataFrame index using existing columns.
Set the DataFrame index (row labels) using one or more existing
columns or arrays (of the correct length). The index can replace the
existing index or expand on it.
Parameters
----------
keys : label or array-like or list of labels/arrays
This parameter can be either a single column key, a single array of
the same length as the calling DataFrame, or a list containing an
arbitrary combination of column keys and arrays. Here, "array"
encompasses :class:`Series`, :class:`Index`, ``np.ndarray``, and
instances of :class:`~collections.abc.Iterator`.
drop : bool, default True
Delete columns to be used as the new index.
append : bool, default False
Whether to append columns to existing index.
inplace : bool, default False
If True, modifies the DataFrame in place (do not create a new object).
verify_integrity : bool, default False
Check the new index for duplicates. Otherwise defer the check until
necessary. Setting to False will improve the performance of this
method.
Returns
-------
DataFrame or None
Changed row labels or None if ``inplace=True``.
See Also
--------
DataFrame.reset_index : Opposite of set_index.
DataFrame.reindex : Change to new indices or expand indices.
DataFrame.reindex_like : Change to same indices as other DataFrame.
Examples
--------
>>> df = pd.DataFrame({'month': [1, 4, 7, 10],
... 'year': [2012, 2014, 2013, 2014],
... 'sale': [55, 40, 84, 31]})
>>> df
month year sale
0 1 2012 55
1 4 2014 40
2 7 2013 84
3 10 2014 31
Set the index to become the 'month' column:
>>> df.set_index('month')
year sale
month
1 2012 55
4 2014 40
7 2013 84
10 2014 31
Create a MultiIndex using columns 'year' and 'month':
>>> df.set_index(['year', 'month'])
sale
year month
2012 1 55
2014 4 40
2013 7 84
2014 10 31
Create a MultiIndex using an Index and a column:
>>> df.set_index([pd.Index([1, 2, 3, 4]), 'year'])
month sale
year
1 2012 1 55
2 2014 4 40
3 2013 7 84
4 2014 10 31
Create a MultiIndex using two Series:
>>> s = pd.Series([1, 2, 3, 4])
>>> df.set_index([s, s**2])
month year sale
1 1 1 2012 55
2 4 4 2014 40
3 9 7 2013 84
4 16 10 2014 31
- shift(self, periods=1, freq: 'Frequency | None' = None, axis: 'Axis' = 0, fill_value=<no_default>) -> 'DataFrame'
- Shift index by desired number of periods with an optional time `freq`.
When `freq` is not passed, shift the index without realigning the data.
If `freq` is passed (in this case, the index must be date or datetime,
or it will raise a `NotImplementedError`), the index will be
increased using the periods and the `freq`. `freq` can be inferred
when specified as "infer" as long as either freq or inferred_freq
attribute is set in the index.
Parameters
----------
periods : int
Number of periods to shift. Can be positive or negative.
freq : DateOffset, tseries.offsets, timedelta, or str, optional
Offset to use from the tseries module or time rule (e.g. 'EOM').
If `freq` is specified then the index values are shifted but the
data is not realigned. That is, use `freq` if you would like to
extend the index when shifting and preserve the original data.
If `freq` is specified as "infer" then it will be inferred from
the freq or inferred_freq attributes of the index. If neither of
those attributes exist, a ValueError is thrown.
axis : {0 or 'index', 1 or 'columns', None}, default None
Shift direction.
fill_value : object, optional
The scalar value to use for newly introduced missing values.
the default depends on the dtype of `self`.
For numeric data, ``np.nan`` is used.
For datetime, timedelta, or period data, etc. :attr:`NaT` is used.
For extension dtypes, ``self.dtype.na_value`` is used.
.. versionchanged:: 1.1.0
Returns
-------
DataFrame
Copy of input object, shifted.
See Also
--------
Index.shift : Shift values of Index.
DatetimeIndex.shift : Shift values of DatetimeIndex.
PeriodIndex.shift : Shift values of PeriodIndex.
tshift : Shift the time index, using the index's frequency if
available.
Examples
--------
>>> df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
... "Col2": [13, 23, 18, 33, 48],
... "Col3": [17, 27, 22, 37, 52]},
... index=pd.date_range("2020-01-01", "2020-01-05"))
>>> df
Col1 Col2 Col3
2020-01-01 10 13 17
2020-01-02 20 23 27
2020-01-03 15 18 22
2020-01-04 30 33 37
2020-01-05 45 48 52
>>> df.shift(periods=3)
Col1 Col2 Col3
2020-01-01 NaN NaN NaN
2020-01-02 NaN NaN NaN
2020-01-03 NaN NaN NaN
2020-01-04 10.0 13.0 17.0
2020-01-05 20.0 23.0 27.0
>>> df.shift(periods=1, axis="columns")
Col1 Col2 Col3
2020-01-01 NaN 10 13
2020-01-02 NaN 20 23
2020-01-03 NaN 15 18
2020-01-04 NaN 30 33
2020-01-05 NaN 45 48
>>> df.shift(periods=3, fill_value=0)
Col1 Col2 Col3
2020-01-01 0 0 0
2020-01-02 0 0 0
2020-01-03 0 0 0
2020-01-04 10 13 17
2020-01-05 20 23 27
>>> df.shift(periods=3, freq="D")
Col1 Col2 Col3
2020-01-04 10 13 17
2020-01-05 20 23 27
2020-01-06 15 18 22
2020-01-07 30 33 37
2020-01-08 45 48 52
>>> df.shift(periods=3, freq="infer")
Col1 Col2 Col3
2020-01-04 10 13 17
2020-01-05 20 23 27
2020-01-06 15 18 22
2020-01-07 30 33 37
2020-01-08 45 48 52
- skew(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return unbiased skew over requested axis.
Normalized by N-1.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
Series or DataFrame (if level specified)
- sort_index(self, axis: 'Axis' = 0, level: 'Level | None' = None, ascending: 'bool | int | Sequence[bool | int]' = True, inplace: 'bool' = False, kind: 'str' = 'quicksort', na_position: 'str' = 'last', sort_remaining: 'bool' = True, ignore_index: 'bool' = False, key: 'IndexKeyFunc' = None)
- Sort object by labels (along an axis).
Returns a new DataFrame sorted by label if `inplace` argument is
``False``, otherwise updates the original DataFrame and returns None.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis along which to sort. The value 0 identifies the rows,
and 1 identifies the columns.
level : int or level name or list of ints or list of level names
If not None, sort on values in specified index level(s).
ascending : bool or list-like of bools, default True
Sort ascending vs. descending. When the index is a MultiIndex the
sort direction can be controlled for each level individually.
inplace : bool, default False
If True, perform operation in-place.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'
Choice of sorting algorithm. See also :func:`numpy.sort` for more
information. `mergesort` and `stable` are the only stable algorithms. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position : {'first', 'last'}, default 'last'
Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.
Not implemented for MultiIndex.
sort_remaining : bool, default True
If True and sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified level.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.0.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape. For MultiIndex
inputs, the key is applied *per level*.
.. versionadded:: 1.1.0
Returns
-------
DataFrame or None
The original DataFrame sorted by the labels or None if ``inplace=True``.
See Also
--------
Series.sort_index : Sort Series by the index.
DataFrame.sort_values : Sort DataFrame by the value.
Series.sort_values : Sort Series by the value.
Examples
--------
>>> df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150],
... columns=['A'])
>>> df.sort_index()
A
1 4
29 2
100 1
150 5
234 3
By default, it sorts in ascending order, to sort in descending order,
use ``ascending=False``
>>> df.sort_index(ascending=False)
A
234 3
150 5
100 1
29 2
1 4
A key function can be specified which is applied to the index before
sorting. For a ``MultiIndex`` this is applied to each level separately.
>>> df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd'])
>>> df.sort_index(key=lambda x: x.str.lower())
a
A 1
b 2
C 3
d 4
- sort_values(self, by, axis: 'Axis' = 0, ascending=True, inplace: 'bool' = False, kind: 'str' = 'quicksort', na_position: 'str' = 'last', ignore_index: 'bool' = False, key: 'ValueKeyFunc' = None)
- Sort by the values along either axis.
Parameters
----------
by : str or list of str
Name or list of names to sort by.
- if `axis` is 0 or `'index'` then `by` may contain index
levels and/or column labels.
- if `axis` is 1 or `'columns'` then `by` may contain column
levels and/or index labels.
axis : {0 or 'index', 1 or 'columns'}, default 0
Axis to be sorted.
ascending : bool or list of bool, default True
Sort ascending vs. descending. Specify list for multiple sort
orders. If this is a list of bools, must match the length of
the by.
inplace : bool, default False
If True, perform operation in-place.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'
Choice of sorting algorithm. See also :func:`numpy.sort` for more
information. `mergesort` and `stable` are the only stable algorithms. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position : {'first', 'last'}, default 'last'
Puts NaNs at the beginning if `first`; `last` puts NaNs at the
end.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.0.0
key : callable, optional
Apply the key function to the values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect a
``Series`` and return a Series with the same shape as the input.
It will be applied to each column in `by` independently.
.. versionadded:: 1.1.0
Returns
-------
DataFrame or None
DataFrame with sorted values or None if ``inplace=True``.
See Also
--------
DataFrame.sort_index : Sort a DataFrame by the index.
Series.sort_values : Similar method for a Series.
Examples
--------
>>> df = pd.DataFrame({
... 'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
... 'col2': [2, 1, 9, 8, 7, 4],
... 'col3': [0, 1, 9, 4, 2, 3],
... 'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
3 NaN 8 4 D
4 D 7 2 e
5 C 4 3 F
Sort by col1
>>> df.sort_values(by=['col1'])
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
5 C 4 3 F
4 D 7 2 e
3 NaN 8 4 D
Sort by multiple columns
>>> df.sort_values(by=['col1', 'col2'])
col1 col2 col3 col4
1 A 1 1 B
0 A 2 0 a
2 B 9 9 c
5 C 4 3 F
4 D 7 2 e
3 NaN 8 4 D
Sort Descending
>>> df.sort_values(by='col1', ascending=False)
col1 col2 col3 col4
4 D 7 2 e
5 C 4 3 F
2 B 9 9 c
0 A 2 0 a
1 A 1 1 B
3 NaN 8 4 D
Putting NAs first
>>> df.sort_values(by='col1', ascending=False, na_position='first')
col1 col2 col3 col4
3 NaN 8 4 D
4 D 7 2 e
5 C 4 3 F
2 B 9 9 c
0 A 2 0 a
1 A 1 1 B
Sorting with a key function
>>> df.sort_values(by='col4', key=lambda col: col.str.lower())
col1 col2 col3 col4
0 A 2 0 a
1 A 1 1 B
2 B 9 9 c
3 NaN 8 4 D
4 D 7 2 e
5 C 4 3 F
Natural sort with the key argument,
using the `natsort <https://github.com/SethMMorton/natsort>` package.
>>> df = pd.DataFrame({
... "time": ['0hr', '128hr', '72hr', '48hr', '96hr'],
... "value": [10, 20, 30, 40, 50]
... })
>>> df
time value
0 0hr 10
1 128hr 20
2 72hr 30
3 48hr 40
4 96hr 50
>>> from natsort import index_natsorted
>>> df.sort_values(
... by="time",
... key=lambda x: np.argsort(index_natsorted(df["time"]))
... )
time value
0 0hr 10
3 48hr 40
2 72hr 30
4 96hr 50
1 128hr 20
- stack(self, level: 'Level' = -1, dropna: 'bool' = True)
- Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level
index with one or more new inner-most levels compared to the current
DataFrame. The new inner-most levels are created by pivoting the
columns of the current dataframe:
- if the columns have a single level, the output is a Series;
- if the columns have multiple levels, the new index
level(s) is (are) taken from the prescribed level(s) and
the output is a DataFrame.
Parameters
----------
level : int, str, list, default -1
Level(s) to stack from the column axis onto the index
axis, defined as one index or label, or a list of indices
or labels.
dropna : bool, default True
Whether to drop rows in the resulting Frame/Series with
missing values. Stacking a column level onto the index
axis can create combinations of index and column values
that are missing from the original dataframe. See Examples
section.
Returns
-------
DataFrame or Series
Stacked dataframe or series.
See Also
--------
DataFrame.unstack : Unstack prescribed level(s) from index axis
onto column axis.
DataFrame.pivot : Reshape dataframe from long format to wide
format.
DataFrame.pivot_table : Create a spreadsheet-style pivot table
as a DataFrame.
Notes
-----
The function is named by analogy with a collection of books
being reorganized from being side by side on a horizontal
position (the columns of the dataframe) to being stacked
vertically on top of each other (in the index of the
dataframe).
Reference :ref:`the user guide <reshaping.stacking>` for more examples.
Examples
--------
**Single level columns**
>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
... index=['cat', 'dog'],
... columns=['weight', 'height'])
Stacking a dataframe with a single level column axis returns a Series:
>>> df_single_level_cols
weight height
cat 0 1
dog 2 3
>>> df_single_level_cols.stack()
cat weight 0
height 1
dog weight 2
height 3
dtype: int64
**Multi level columns: simple case**
>>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
... ('weight', 'pounds')])
>>> df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
... index=['cat', 'dog'],
... columns=multicol1)
Stacking a dataframe with a multi-level column axis:
>>> df_multi_level_cols1
weight
kg pounds
cat 1 2
dog 2 4
>>> df_multi_level_cols1.stack()
weight
cat kg 1
pounds 2
dog kg 2
pounds 4
**Missing values**
>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
... ('height', 'm')])
>>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
... index=['cat', 'dog'],
... columns=multicol2)
It is common to have missing values when stacking a dataframe
with multi-level columns, as the stacked dataframe typically
has more values than the original dataframe. Missing values
are filled with NaNs:
>>> df_multi_level_cols2
weight height
kg m
cat 1.0 2.0
dog 3.0 4.0
>>> df_multi_level_cols2.stack()
height weight
cat kg NaN 1.0
m 2.0 NaN
dog kg NaN 3.0
m 4.0 NaN
**Prescribing the level(s) to be stacked**
The first parameter controls which level or levels are stacked:
>>> df_multi_level_cols2.stack(0)
kg m
cat height NaN 2.0
weight 1.0 NaN
dog height NaN 4.0
weight 3.0 NaN
>>> df_multi_level_cols2.stack([0, 1])
cat height m 2.0
weight kg 1.0
dog height m 4.0
weight kg 3.0
dtype: float64
**Dropping missing values**
>>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
... index=['cat', 'dog'],
... columns=multicol2)
Note that rows where all values are missing are dropped by
default but this behaviour can be controlled via the dropna
keyword parameter:
>>> df_multi_level_cols3
weight height
kg m
cat NaN 1.0
dog 2.0 3.0
>>> df_multi_level_cols3.stack(dropna=False)
height weight
cat kg NaN NaN
m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
>>> df_multi_level_cols3.stack(dropna=True)
height weight
cat m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
- std(self, axis=None, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)
- Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
Parameters
----------
axis : {index (0), columns (1)}
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Returns
-------
Series or DataFrame (if level specified)
Notes
-----
To have the same behaviour as `numpy.std`, use `ddof=0` (instead of the
default `ddof=1`)
Examples
--------
>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
... 'age': [21, 25, 62, 43],
... 'height': [1.61, 1.87, 1.49, 2.01]}
... ).set_index('person_id')
>>> df
age height
person_id
0 21 1.61
1 25 1.87
2 62 1.49
3 43 2.01
The standard deviation of the columns can be found as follows:
>>> df.std()
age 18.786076
height 0.237417
Alternatively, `ddof=0` can be set to normalize by N instead of N-1:
>>> df.std(ddof=0)
age 16.269219
height 0.205609
- sub(self, other, axis='columns', level=None, fill_value=None)
- Get Subtraction of dataframe and other, element-wise (binary operator `sub`).
Equivalent to ``dataframe - other``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `rsub`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- subtract = sub(self, other, axis='columns', level=None, fill_value=None)
- sum(self, axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
- Return the sum of the values over the requested axis.
This is equivalent to the method ``numpy.sum``.
Parameters
----------
axis : {index (0), columns (1)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
min_count : int, default 0
The required number of valid values to perform the operation. If fewer than
``min_count`` non-NA values are present the result will be NA.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
Series or DataFrame (if level specified)
See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.sum : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.
Examples
--------
>>> idx = pd.MultiIndex.from_arrays([
... ['warm', 'warm', 'cold', 'cold'],
... ['dog', 'falcon', 'fish', 'spider']],
... names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded animal
warm dog 4
falcon 2
cold fish 0
spider 8
Name: legs, dtype: int64
>>> s.sum()
14
By default, the sum of an empty or all-NA Series is ``0``.
>>> pd.Series([], dtype="float64").sum() # min_count=0 is the default
0.0
This can be controlled with the ``min_count`` parameter. For example, if
you'd like the sum of an empty series to be NaN, pass ``min_count=1``.
>>> pd.Series([], dtype="float64").sum(min_count=1)
nan
Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and
empty series identically.
>>> pd.Series([np.nan]).sum()
0.0
>>> pd.Series([np.nan]).sum(min_count=1)
nan
- swaplevel(self, i: 'Axis' = -2, j: 'Axis' = -1, axis: 'Axis' = 0) -> 'DataFrame'
- Swap levels i and j in a :class:`MultiIndex`.
Default is to swap the two innermost levels of the index.
Parameters
----------
i, j : int or str
Levels of the indices to be swapped. Can pass level name as string.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to swap levels on. 0 or 'index' for row-wise, 1 or
'columns' for column-wise.
Returns
-------
DataFrame
DataFrame with levels swapped in MultiIndex.
Examples
--------
>>> df = pd.DataFrame(
... {"Grade": ["A", "B", "A", "C"]},
... index=[
... ["Final exam", "Final exam", "Coursework", "Coursework"],
... ["History", "Geography", "History", "Geography"],
... ["January", "February", "March", "April"],
... ],
... )
>>> df
Grade
Final exam History January A
Geography February B
Coursework History March A
Geography April C
In the following example, we will swap the levels of the indices.
Here, we will swap the levels column-wise, but levels can be swapped row-wise
in a similar manner. Note that column-wise is the default behaviour.
By not supplying any arguments for i and j, we swap the last and second to
last indices.
>>> df.swaplevel()
Grade
Final exam January History A
February Geography B
Coursework March History A
April Geography C
By supplying one argument, we can choose which index to swap the last
index with. We can for example swap the first index with the last one as
follows.
>>> df.swaplevel(0)
Grade
January History Final exam A
February Geography Final exam B
March History Coursework A
April Geography Coursework C
We can also define explicitly which indices we want to swap by supplying values
for both i and j. Here, we for example swap the first and second indices.
>>> df.swaplevel(0, 1)
Grade
History Final exam January A
Geography Final exam February B
History Coursework March A
Geography Coursework April C
- to_dict(self, orient: 'str' = 'dict', into=<class 'dict'>)
- Convert the DataFrame to a dictionary.
The type of the key-value pairs can be customized with the parameters
(see below).
Parameters
----------
orient : str {'dict', 'list', 'series', 'split', 'records', 'index'}
Determines the type of the values of the dictionary.
- 'dict' (default) : dict like {column -> {index -> value}}
- 'list' : dict like {column -> [values]}
- 'series' : dict like {column -> Series(values)}
- 'split' : dict like
{'index' -> [index], 'columns' -> [columns], 'data' -> [values]}
- 'tight' : dict like
{'index' -> [index], 'columns' -> [columns], 'data' -> [values],
'index_names' -> [index.names], 'column_names' -> [column.names]}
- 'records' : list like
[{column -> value}, ... , {column -> value}]
- 'index' : dict like {index -> {column -> value}}
Abbreviations are allowed. `s` indicates `series` and `sp`
indicates `split`.
.. versionadded:: 1.4.0
'tight' as an allowed value for the ``orient`` argument
into : class, default dict
The collections.abc.Mapping subclass used for all Mappings
in the return value. Can be the actual class or an empty
instance of the mapping type you want. If you want a
collections.defaultdict, you must pass it initialized.
Returns
-------
dict, list or collections.abc.Mapping
Return a collections.abc.Mapping object representing the DataFrame.
The resulting transformation depends on the `orient` parameter.
See Also
--------
DataFrame.from_dict: Create a DataFrame from a dictionary.
DataFrame.to_json: Convert a DataFrame to JSON format.
Examples
--------
>>> df = pd.DataFrame({'col1': [1, 2],
... 'col2': [0.5, 0.75]},
... index=['row1', 'row2'])
>>> df
col1 col2
row1 1 0.50
row2 2 0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
You can specify the return orientation.
>>> df.to_dict('series')
{'col1': row1 1
row2 2
Name: col1, dtype: int64,
'col2': row1 0.50
row2 0.75
Name: col2, dtype: float64}
>>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]]}
>>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]
>>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}
>>> df.to_dict('tight')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}
You can also specify the mapping type.
>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
If you want a `defaultdict`, you need to initialize it:
>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]
- to_feather(self, path: 'FilePath | WriteBuffer[bytes]', **kwargs) -> 'None'
- Write a DataFrame to the binary Feather format.
Parameters
----------
path : str, path object, file-like object
String, path object (implementing ``os.PathLike[str]``), or file-like
object implementing a binary ``write()`` function. If a string or a path,
it will be used as Root Directory path when writing a partitioned dataset.
**kwargs :
Additional keywords passed to :func:`pyarrow.feather.write_feather`.
Starting with pyarrow 0.17, this includes the `compression`,
`compression_level`, `chunksize` and `version` keywords.
.. versionadded:: 1.1.0
Notes
-----
This function writes the dataframe as a `feather file
<https://arrow.apache.org/docs/python/feather.html>`_. Requires a default
index. For saving the DataFrame with your custom index use a method that
supports custom indices e.g. `to_parquet`.
- to_gbq(self, destination_table: 'str', project_id: 'str | None' = None, chunksize: 'int | None' = None, reauth: 'bool' = False, if_exists: 'str' = 'fail', auth_local_webserver: 'bool' = False, table_schema: 'list[dict[str, str]] | None' = None, location: 'str | None' = None, progress_bar: 'bool' = True, credentials=None) -> 'None'
- Write a DataFrame to a Google BigQuery table.
This function requires the `pandas-gbq package
<https://pandas-gbq.readthedocs.io>`__.
See the `How to authenticate with Google BigQuery
<https://pandas-gbq.readthedocs.io/en/latest/howto/authentication.html>`__
guide for authentication instructions.
Parameters
----------
destination_table : str
Name of table to be written, in the form ``dataset.tablename``.
project_id : str, optional
Google BigQuery Account project ID. Optional when available from
the environment.
chunksize : int, optional
Number of rows to be inserted in each chunk from the dataframe.
Set to ``None`` to load the whole dataframe at once.
reauth : bool, default False
Force Google BigQuery to re-authenticate the user. This is useful
if multiple accounts are used.
if_exists : str, default 'fail'
Behavior when the destination table exists. Value can be one of:
``'fail'``
If table exists raise pandas_gbq.gbq.TableCreationError.
``'replace'``
If table exists, drop it, recreate it, and insert data.
``'append'``
If table exists, insert data. Create if does not exist.
auth_local_webserver : bool, default False
Use the `local webserver flow`_ instead of the `console flow`_
when getting user credentials.
.. _local webserver flow:
https://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_local_server
.. _console flow:
https://google-auth-oauthlib.readthedocs.io/en/latest/reference/google_auth_oauthlib.flow.html#google_auth_oauthlib.flow.InstalledAppFlow.run_console
*New in version 0.2.0 of pandas-gbq*.
table_schema : list of dicts, optional
List of BigQuery table fields to which according DataFrame
columns conform to, e.g. ``[{'name': 'col1', 'type':
'STRING'},...]``. If schema is not provided, it will be
generated according to dtypes of DataFrame columns. See
BigQuery API documentation on available names of a field.
*New in version 0.3.1 of pandas-gbq*.
location : str, optional
Location where the load job should run. See the `BigQuery locations
documentation
<https://cloud.google.com/bigquery/docs/dataset-locations>`__ for a
list of available locations. The location must match that of the
target dataset.
*New in version 0.5.0 of pandas-gbq*.
progress_bar : bool, default True
Use the library `tqdm` to show the progress bar for the upload,
chunk by chunk.
*New in version 0.5.0 of pandas-gbq*.
credentials : google.auth.credentials.Credentials, optional
Credentials for accessing Google APIs. Use this parameter to
override default credentials, such as to use Compute Engine
:class:`google.auth.compute_engine.Credentials` or Service
Account :class:`google.oauth2.service_account.Credentials`
directly.
*New in version 0.8.0 of pandas-gbq*.
See Also
--------
pandas_gbq.to_gbq : This function in the pandas-gbq library.
read_gbq : Read a DataFrame from Google BigQuery.
- to_html(self, buf: 'FilePath | WriteBuffer[str] | None' = None, columns: 'Sequence[str] | None' = None, col_space: 'ColspaceArgType | None' = None, header: 'bool | Sequence[str]' = True, index: 'bool' = True, na_rep: 'str' = 'NaN', formatters: 'FormattersType | None' = None, float_format: 'FloatFormatType | None' = None, sparsify: 'bool | None' = None, index_names: 'bool' = True, justify: 'str | None' = None, max_rows: 'int | None' = None, max_cols: 'int | None' = None, show_dimensions: 'bool | str' = False, decimal: 'str' = '.', bold_rows: 'bool' = True, classes: 'str | list | tuple | None' = None, escape: 'bool' = True, notebook: 'bool' = False, border: 'int | None' = None, table_id: 'str | None' = None, render_links: 'bool' = False, encoding: 'str | None' = None)
- Render a DataFrame as an HTML table.
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : sequence, optional, default None
The subset of columns to write. Writes all columns by default.
col_space : str or int, list or dict of int or str, optional
The minimum width of each column in CSS length units. An int is assumed to be px units.
.. versionadded:: 0.25.0
Ability to use str.
header : bool, optional
Whether to print column labels, default True.
index : bool, optional, default True
Whether to print index (row) labels.
na_rep : str, optional, default 'NaN'
String representation of ``NaN`` to use.
formatters : list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns' elements by position or
name.
The result of each function must be a unicode string.
List/tuple must be of length equal to the number of columns.
float_format : one-parameter function, optional, default None
Formatter function to apply to columns' elements if they are
floats. This function must return a unicode string and will be
applied only to the non-``NaN`` elements, with ``NaN`` being
handled by ``na_rep``.
.. versionchanged:: 1.2.0
sparsify : bool, optional, default True
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names : bool, optional, default True
Prints the names of the indexes.
justify : str, default None
How to justify the column labels. If None uses the option from
the print configuration (controlled by set_option), 'right' out
of the box. Valid values are
* left
* right
* center
* justify
* justify-all
* start
* end
* inherit
* match-parent
* initial
* unset.
max_rows : int, optional
Maximum number of rows to display in the console.
max_cols : int, optional
Maximum number of columns to display in the console.
show_dimensions : bool, default False
Display DataFrame dimensions (number of rows by number of columns).
decimal : str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
bold_rows : bool, default True
Make the row labels bold in the output.
classes : str or list or tuple, default None
CSS class(es) to apply to the resulting html table.
escape : bool, default True
Convert the characters <, >, and & to HTML-safe sequences.
notebook : {True, False}, default False
Whether the generated HTML is for IPython Notebook.
border : int
A ``border=border`` attribute is included in the opening
`<table>` tag. Default ``pd.options.display.html.border``.
table_id : str, optional
A css id is included in the opening `<table>` tag if specified.
render_links : bool, default False
Convert URLs to HTML links.
encoding : str, default "utf-8"
Set character encoding.
.. versionadded:: 1.0
Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns
None.
See Also
--------
to_string : Convert DataFrame to a string.
- to_markdown(self, buf: 'IO[str] | str | None' = None, mode: 'str' = 'wt', index: 'bool' = True, storage_options: 'StorageOptions' = None, **kwargs) -> 'str | None'
- Print DataFrame in Markdown-friendly format.
.. versionadded:: 1.0.0
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
mode : str, optional
Mode in which file is opened, "wt" by default.
index : bool, optional, default True
Add index (row) labels.
.. versionadded:: 1.1.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
**kwargs
These parameters will be passed to `tabulate <https://pypi.org/project/tabulate>`_.
Returns
-------
str
DataFrame in Markdown-friendly format.
Notes
-----
Requires the `tabulate <https://pypi.org/project/tabulate>`_ package.
Examples
--------
>>> df = pd.DataFrame(
... data={"animal_1": ["elk", "pig"], "animal_2": ["dog", "quetzal"]}
... )
>>> print(df.to_markdown())
| | animal_1 | animal_2 |
|---:|:-----------|:-----------|
| 0 | elk | dog |
| 1 | pig | quetzal |
Output markdown with a tabulate option.
>>> print(df.to_markdown(tablefmt="grid"))
+----+------------+------------+
| | animal_1 | animal_2 |
+====+============+============+
| 0 | elk | dog |
+----+------------+------------+
| 1 | pig | quetzal |
+----+------------+------------+
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>) -> 'np.ndarray'
- Convert the DataFrame to a NumPy array.
By default, the dtype of the returned array will be the common NumPy
dtype of all types in the DataFrame. For example, if the dtypes are
``float16`` and ``float32``, the results dtype will be ``float32``.
This may require copying data and coercing values, which may be
expensive.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the dtypes of the DataFrame columns.
.. versionadded:: 1.1.0
Returns
-------
numpy.ndarray
See Also
--------
Series.to_numpy : Similar method for Series.
Examples
--------
>>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
[2, 4]])
With heterogeneous data, the lowest common type will have to
be used.
>>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
[2. , 4.5]])
For a mix of numeric and non-numeric types, the output array will
have object dtype.
>>> df['C'] = pd.date_range('2000', periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
[2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)
- to_parquet(self, path: 'FilePath | WriteBuffer[bytes] | None' = None, engine: 'str' = 'auto', compression: 'str | None' = 'snappy', index: 'bool | None' = None, partition_cols: 'list[str] | None' = None, storage_options: 'StorageOptions' = None, **kwargs) -> 'bytes | None'
- Write a DataFrame to the binary parquet format.
This function writes the dataframe as a `parquet file
<https://parquet.apache.org/>`_. You can choose different parquet
backends, and have the option of compression. See
:ref:`the user guide <io.parquet>` for more details.
Parameters
----------
path : str, path object, file-like object, or None, default None
String, path object (implementing ``os.PathLike[str]``), or file-like
object implementing a binary ``write()`` function. If None, the result is
returned as bytes. If a string or path, it will be used as Root Directory
path when writing a partitioned dataset.
.. versionchanged:: 1.2.0
Previously this was "fname"
engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
Parquet library to use. If 'auto', then the option
``io.parquet.engine`` is used. The default ``io.parquet.engine``
behavior is to try 'pyarrow', falling back to 'fastparquet' if
'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
Name of the compression to use. Use ``None`` for no compression.
index : bool, default None
If ``True``, include the dataframe's index(es) in the file output.
If ``False``, they will not be written to the file.
If ``None``, similar to ``True`` the dataframe's index(es)
will be saved. However, instead of being saved as values,
the RangeIndex will be stored as a range in the metadata so it
doesn't require much space and is faster. Other indexes will
be included as columns in the file output.
partition_cols : list, optional, default None
Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
Must be None if path is not a string.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
**kwargs
Additional arguments passed to the parquet library. See
:ref:`pandas io <io.parquet>` for more details.
Returns
-------
bytes if no path argument is provided else None
See Also
--------
read_parquet : Read a parquet file.
DataFrame.to_csv : Write a csv file.
DataFrame.to_sql : Write to a sql table.
DataFrame.to_hdf : Write to hdf.
Notes
-----
This function requires either the `fastparquet
<https://pypi.org/project/fastparquet>`_ or `pyarrow
<https://arrow.apache.org/docs/python/>`_ library.
Examples
--------
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
... compression='gzip') # doctest: +SKIP
>>> pd.read_parquet('df.parquet.gzip') # doctest: +SKIP
col1 col2
0 1 3
1 2 4
If you want to get a buffer to the parquet content you can use a io.BytesIO
object, as long as you don't use partition_cols, which creates multiple files.
>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
0
>>> content = f.read()
- to_period(self, freq: 'Frequency | None' = None, axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'
- Convert DataFrame from DatetimeIndex to PeriodIndex.
Convert DataFrame from DatetimeIndex to PeriodIndex with desired
frequency (inferred from index if not passed).
Parameters
----------
freq : str, default
Frequency of the PeriodIndex.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to convert (the index by default).
copy : bool, default True
If False then underlying input data is not copied.
Returns
-------
DataFrame with PeriodIndex
Examples
--------
>>> idx = pd.to_datetime(
... [
... "2001-03-31 00:00:00",
... "2002-05-31 00:00:00",
... "2003-08-31 00:00:00",
... ]
... )
>>> idx
DatetimeIndex(['2001-03-31', '2002-05-31', '2003-08-31'],
dtype='datetime64[ns]', freq=None)
>>> idx.to_period("M")
PeriodIndex(['2001-03', '2002-05', '2003-08'], dtype='period[M]')
For the yearly frequency
>>> idx.to_period("Y")
PeriodIndex(['2001', '2002', '2003'], dtype='period[A-DEC]')
- to_records(self, index=True, column_dtypes=None, index_dtypes=None) -> 'np.recarray'
- Convert DataFrame to a NumPy record array.
Index will be included as the first field of the record array if
requested.
Parameters
----------
index : bool, default True
Include index in resulting record array, stored in 'index'
field or using the index label, if set.
column_dtypes : str, type, dict, default None
If a string or type, the data type to store all columns. If
a dictionary, a mapping of column names and indices (zero-indexed)
to specific data types.
index_dtypes : str, type, dict, default None
If a string or type, the data type to store all index levels. If
a dictionary, a mapping of index level names and indices
(zero-indexed) to specific data types.
This mapping is applied only if `index=True`.
Returns
-------
numpy.recarray
NumPy ndarray with the DataFrame labels as fields and each row
of the DataFrame as entries.
See Also
--------
DataFrame.from_records: Convert structured or record ndarray
to DataFrame.
numpy.recarray: An ndarray that allows field access using
attributes, analogous to typed columns in a
spreadsheet.
Examples
--------
>>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
... index=['a', 'b'])
>>> df
A B
a 1 0.50
b 2 0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])
If the DataFrame index has no label then the recarray field name
is set to 'index'. If the index has a label then this is used as the
field name:
>>> df.index = df.index.rename("I")
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('I', 'O'), ('A', '<i8'), ('B', '<f8')])
The index can be excluded from the record array:
>>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
dtype=[('A', '<i8'), ('B', '<f8')])
Data types can be specified for the columns:
>>> df.to_records(column_dtypes={"A": "int32"})
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])
As well as for the index:
>>> df.to_records(index_dtypes="<S2")
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
dtype=[('I', 'S2'), ('A', '<i8'), ('B', '<f8')])
>>> index_dtypes = f"<S{df.index.str.len().max()}"
>>> df.to_records(index_dtypes=index_dtypes)
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
dtype=[('I', 'S1'), ('A', '<i8'), ('B', '<f8')])
- to_stata(self, path: 'FilePath | WriteBuffer[bytes]', convert_dates: 'dict[Hashable, str] | None' = None, write_index: 'bool' = True, byteorder: 'str | None' = None, time_stamp: 'datetime.datetime | None' = None, data_label: 'str | None' = None, variable_labels: 'dict[Hashable, str] | None' = None, version: 'int | None' = 114, convert_strl: 'Sequence[Hashable] | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None, *, value_labels: 'dict[Hashable, dict[float | int, str]] | None' = None) -> 'None'
- Export DataFrame object to Stata dta format.
Writes the DataFrame to a Stata dataset file.
"dta" files contain a Stata dataset.
Parameters
----------
path : str, path object, or buffer
String, path object (implementing ``os.PathLike[str]``), or file-like
object implementing a binary ``write()`` function.
.. versionchanged:: 1.0.0
Previously this was "fname"
convert_dates : dict
Dictionary mapping columns containing datetime types to stata
internal format to use when writing the dates. Options are 'tc',
'td', 'tm', 'tw', 'th', 'tq', 'ty'. Column can be either an integer
or a name. Datetime columns that do not have a conversion type
specified will be converted to 'tc'. Raises NotImplementedError if
a datetime column has timezone information.
write_index : bool
Write the index to Stata dataset.
byteorder : str
Can be ">", "<", "little", or "big". default is `sys.byteorder`.
time_stamp : datetime
A datetime to use as file creation date. Default is the current
time.
data_label : str, optional
A label for the data set. Must be 80 characters or smaller.
variable_labels : dict
Dictionary containing columns as keys and variable labels as
values. Each label must be 80 characters or smaller.
version : {114, 117, 118, 119, None}, default 114
Version to use in the output dta file. Set to None to let pandas
decide between 118 or 119 formats depending on the number of
columns in the frame. Version 114 can be read by Stata 10 and
later. Version 117 can be read by Stata 13 or later. Version 118
is supported in Stata 14 and later. Version 119 is supported in
Stata 15 and later. Version 114 limits string variables to 244
characters or fewer while versions 117 and later allow strings
with lengths up to 2,000,000 characters. Versions 118 and 119
support Unicode characters, and version 119 supports more than
32,767 variables.
Version 119 should usually only be used when the number of
variables exceeds the capacity of dta format 118. Exporting
smaller datasets in format 119 may have unintended consequences,
and, as of November 2020, Stata SE cannot read version 119 files.
.. versionchanged:: 1.0.0
Added support for formats 118 and 119.
convert_strl : list, optional
List of column names to convert to string columns to Stata StrL
format. Only available if version is 117. Storing strings in the
StrL format can produce smaller dta files if strings have more than
8 characters and values are repeated.
compression : str or dict, default 'infer'
For on-the-fly compression of the output data. If 'infer' and 'path'
path-like, then detect compression from the following extensions: '.gz',
'.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). Set to
``None`` for no compression. Can also be a dict with key ``'method'`` set
to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``} and other
key-value pairs are forwarded to ``zipfile.ZipFile``, ``gzip.GzipFile``,
``bz2.BZ2File``, or ``zstandard.ZstdDecompressor``, respectively. As an
example, the following could be passed for faster compression and to create
a reproducible gzip archive:
``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.
.. versionadded:: 1.1.0
.. versionchanged:: 1.4.0 Zstandard support.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
value_labels : dict of dicts
Dictionary containing columns as keys and dictionaries of column value
to labels as values. Labels for a single variable must be 32,000
characters or smaller.
.. versionadded:: 1.4.0
Raises
------
NotImplementedError
* If datetimes contain timezone information
* Column dtype is not representable in Stata
ValueError
* Columns listed in convert_dates are neither datetime64[ns]
or datetime.datetime
* Column listed in convert_dates is not in DataFrame
* Categorical label contains more than 32,000 characters
See Also
--------
read_stata : Import Stata data files.
io.stata.StataWriter : Low-level writer for Stata data files.
io.stata.StataWriter117 : Low-level writer for version 117 files.
Examples
--------
>>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon',
... 'parrot'],
... 'speed': [350, 18, 361, 15]})
>>> df.to_stata('animals.dta') # doctest: +SKIP
- to_string(self, buf: 'FilePath | WriteBuffer[str] | None' = None, columns: 'Sequence[str] | None' = None, col_space: 'int | list[int] | dict[Hashable, int] | None' = None, header: 'bool | Sequence[str]' = True, index: 'bool' = True, na_rep: 'str' = 'NaN', formatters: 'fmt.FormattersType | None' = None, float_format: 'fmt.FloatFormatType | None' = None, sparsify: 'bool | None' = None, index_names: 'bool' = True, justify: 'str | None' = None, max_rows: 'int | None' = None, max_cols: 'int | None' = None, show_dimensions: 'bool' = False, decimal: 'str' = '.', line_width: 'int | None' = None, min_rows: 'int | None' = None, max_colwidth: 'int | None' = None, encoding: 'str | None' = None) -> 'str | None'
- Render a DataFrame to a console-friendly tabular output.
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : sequence, optional, default None
The subset of columns to write. Writes all columns by default.
col_space : int, list or dict of int, optional
The minimum width of each column. If a list of ints is given every integers corresponds with one column. If a dict is given, the key references the column, while the value defines the space to use..
header : bool or sequence of str, optional
Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.
index : bool, optional, default True
Whether to print index (row) labels.
na_rep : str, optional, default 'NaN'
String representation of ``NaN`` to use.
formatters : list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns' elements by position or
name.
The result of each function must be a unicode string.
List/tuple must be of length equal to the number of columns.
float_format : one-parameter function, optional, default None
Formatter function to apply to columns' elements if they are
floats. This function must return a unicode string and will be
applied only to the non-``NaN`` elements, with ``NaN`` being
handled by ``na_rep``.
.. versionchanged:: 1.2.0
sparsify : bool, optional, default True
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row.
index_names : bool, optional, default True
Prints the names of the indexes.
justify : str, default None
How to justify the column labels. If None uses the option from
the print configuration (controlled by set_option), 'right' out
of the box. Valid values are
* left
* right
* center
* justify
* justify-all
* start
* end
* inherit
* match-parent
* initial
* unset.
max_rows : int, optional
Maximum number of rows to display in the console.
max_cols : int, optional
Maximum number of columns to display in the console.
show_dimensions : bool, default False
Display DataFrame dimensions (number of rows by number of columns).
decimal : str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
line_width : int, optional
Width to wrap a line in characters.
min_rows : int, optional
The number of rows to display in the console in a truncated repr
(when number of rows is above `max_rows`).
max_colwidth : int, optional
Max width to truncate each column in characters. By default, no limit.
.. versionadded:: 1.0.0
encoding : str, default "utf-8"
Set character encoding.
.. versionadded:: 1.0
Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns
None.
See Also
--------
to_html : Convert DataFrame to HTML.
Examples
--------
>>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
>>> df = pd.DataFrame(d)
>>> print(df.to_string())
col1 col2
0 1 4
1 2 5
2 3 6
- to_timestamp(self, freq: 'Frequency | None' = None, how: 'str' = 'start', axis: 'Axis' = 0, copy: 'bool' = True) -> 'DataFrame'
- Cast to DatetimeIndex of timestamps, at *beginning* of period.
Parameters
----------
freq : str, default frequency of PeriodIndex
Desired frequency.
how : {'s', 'e', 'start', 'end'}
Convention for converting period to timestamp; start of period
vs. end.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to convert (the index by default).
copy : bool, default True
If False then underlying input data is not copied.
Returns
-------
DataFrame with DatetimeIndex
- to_xml(self, path_or_buffer: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, index: 'bool' = True, root_name: 'str | None' = 'data', row_name: 'str | None' = 'row', na_rep: 'str | None' = None, attr_cols: 'list[str] | None' = None, elem_cols: 'list[str] | None' = None, namespaces: 'dict[str | None, str] | None' = None, prefix: 'str | None' = None, encoding: 'str' = 'utf-8', xml_declaration: 'bool | None' = True, pretty_print: 'bool | None' = True, parser: 'str | None' = 'lxml', stylesheet: 'FilePath | ReadBuffer[str] | ReadBuffer[bytes] | None' = None, compression: 'CompressionOptions' = 'infer', storage_options: 'StorageOptions' = None) -> 'str | None'
- Render a DataFrame to an XML document.
.. versionadded:: 1.3.0
Parameters
----------
path_or_buffer : str, path object, file-like object, or None, default None
String, path object (implementing ``os.PathLike[str]``), or file-like
object implementing a ``write()`` function. If None, the result is returned
as a string.
index : bool, default True
Whether to include index in XML document.
root_name : str, default 'data'
The name of root element in XML document.
row_name : str, default 'row'
The name of row element in XML document.
na_rep : str, optional
Missing data representation.
attr_cols : list-like, optional
List of columns to write as attributes in row element.
Hierarchical columns will be flattened with underscore
delimiting the different levels.
elem_cols : list-like, optional
List of columns to write as children in row element. By default,
all columns output as children of row element. Hierarchical
columns will be flattened with underscore delimiting the
different levels.
namespaces : dict, optional
All namespaces to be defined in root element. Keys of dict
should be prefix names and values of dict corresponding URIs.
Default namespaces should be given empty string key. For
example, ::
namespaces = {"": "https://example.com"}
prefix : str, optional
Namespace prefix to be used for every element and/or attribute
in document. This should be one of the keys in ``namespaces``
dict.
encoding : str, default 'utf-8'
Encoding of the resulting document.
xml_declaration : bool, default True
Whether to include the XML declaration at start of document.
pretty_print : bool, default True
Whether output should be pretty printed with indentation and
line breaks.
parser : {'lxml','etree'}, default 'lxml'
Parser module to use for building of tree. Only 'lxml' and
'etree' are supported. With 'lxml', the ability to use XSLT
stylesheet is supported.
stylesheet : str, path object or file-like object, optional
A URL, file-like object, or a raw string containing an XSLT
script used to transform the raw XML output. Script should use
layout of elements and attributes from original output. This
argument requires ``lxml`` to be installed. Only XSLT 1.0
scripts and not later versions is currently supported.
compression : str or dict, default 'infer'
For on-the-fly compression of the output data. If 'infer' and 'path_or_buffer'
path-like, then detect compression from the following extensions: '.gz',
'.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). Set to
``None`` for no compression. Can also be a dict with key ``'method'`` set
to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``} and other
key-value pairs are forwarded to ``zipfile.ZipFile``, ``gzip.GzipFile``,
``bz2.BZ2File``, or ``zstandard.ZstdDecompressor``, respectively. As an
example, the following could be passed for faster compression and to create
a reproducible gzip archive:
``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.
.. versionchanged:: 1.4.0 Zstandard support.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
Returns
-------
None or str
If ``io`` is None, returns the resulting XML format as a
string. Otherwise returns None.
See Also
--------
to_json : Convert the pandas object to a JSON string.
to_html : Convert DataFrame to a html.
Examples
--------
>>> df = pd.DataFrame({'shape': ['square', 'circle', 'triangle'],
... 'degrees': [360, 360, 180],
... 'sides': [4, np.nan, 3]})
>>> df.to_xml() # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>
>>> df.to_xml(attr_cols=[
... 'index', 'shape', 'degrees', 'sides'
... ]) # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<data>
<row index="0" shape="square" degrees="360" sides="4.0"/>
<row index="1" shape="circle" degrees="360"/>
<row index="2" shape="triangle" degrees="180" sides="3.0"/>
</data>
>>> df.to_xml(namespaces={"doc": "https://example.com"},
... prefix="doc") # doctest: +SKIP
<?xml version='1.0' encoding='utf-8'?>
<doc:data xmlns:doc="https://example.com">
<doc:row>
<doc:index>0</doc:index>
<doc:shape>square</doc:shape>
<doc:degrees>360</doc:degrees>
<doc:sides>4.0</doc:sides>
</doc:row>
<doc:row>
<doc:index>1</doc:index>
<doc:shape>circle</doc:shape>
<doc:degrees>360</doc:degrees>
<doc:sides/>
</doc:row>
<doc:row>
<doc:index>2</doc:index>
<doc:shape>triangle</doc:shape>
<doc:degrees>180</doc:degrees>
<doc:sides>3.0</doc:sides>
</doc:row>
</doc:data>
- transform(self, func: 'AggFuncType', axis: 'Axis' = 0, *args, **kwargs) -> 'DataFrame'
- Call ``func`` on self producing a DataFrame with the same axis shape as self.
Parameters
----------
func : function, str, list-like or dict-like
Function to use for transforming the data. If a function, must either
work when passed a DataFrame or when passed to DataFrame.apply. If func
is both list-like and dict-like, dict-like behavior takes precedence.
Accepted combinations are:
- function
- string function name
- list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``
- dict-like of axis labels -> functions, function names or list-like of such.
axis : {0 or 'index', 1 or 'columns'}, default 0
If 0 or 'index': apply function to each column.
If 1 or 'columns': apply function to each row.
*args
Positional arguments to pass to `func`.
**kwargs
Keyword arguments to pass to `func`.
Returns
-------
DataFrame
A DataFrame that must have the same length as self.
Raises
------
ValueError : If the returned DataFrame has a different length than self.
See Also
--------
DataFrame.agg : Only perform aggregating type operations.
DataFrame.apply : Invoke function on a DataFrame.
Notes
-----
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
Examples
--------
>>> df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df
A B
0 0 1
1 1 2
2 2 3
>>> df.transform(lambda x: x + 1)
A B
0 1 2
1 2 3
2 3 4
Even though the resulting DataFrame must have the same length as the
input DataFrame, it is possible to provide several input functions:
>>> s = pd.Series(range(3))
>>> s
0 0
1 1
2 2
dtype: int64
>>> s.transform([np.sqrt, np.exp])
sqrt exp
0 0.000000 1.000000
1 1.000000 2.718282
2 1.414214 7.389056
You can call transform on a GroupBy object:
>>> df = pd.DataFrame({
... "Date": [
... "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
... "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
... "Data": [5, 8, 6, 1, 50, 100, 60, 120],
... })
>>> df
Date Data
0 2015-05-08 5
1 2015-05-07 8
2 2015-05-06 6
3 2015-05-05 1
4 2015-05-08 50
5 2015-05-07 100
6 2015-05-06 60
7 2015-05-05 120
>>> df.groupby('Date')['Data'].transform('sum')
0 55
1 108
2 66
3 121
4 55
5 108
6 66
7 121
Name: Data, dtype: int64
>>> df = pd.DataFrame({
... "c": [1, 1, 1, 2, 2, 2, 2],
... "type": ["m", "n", "o", "m", "m", "n", "n"]
... })
>>> df
c type
0 1 m
1 1 n
2 1 o
3 2 m
4 2 m
5 2 n
6 2 n
>>> df['size'] = df.groupby('c')['type'].transform(len)
>>> df
c type size
0 1 m 3
1 1 n 3
2 1 o 3
3 2 m 4
4 2 m 4
5 2 n 4
6 2 n 4
- transpose(self, *args, copy: 'bool' = False) -> 'DataFrame'
- Transpose index and columns.
Reflect the DataFrame over its main diagonal by writing rows as columns
and vice-versa. The property :attr:`.T` is an accessor to the method
:meth:`transpose`.
Parameters
----------
*args : tuple, optional
Accepted for compatibility with NumPy.
copy : bool, default False
Whether to copy the data after transposing, even for DataFrames
with a single dtype.
Note that a copy is always required for mixed dtype DataFrames,
or for DataFrames with any extension types.
Returns
-------
DataFrame
The transposed DataFrame.
See Also
--------
numpy.transpose : Permute the dimensions of a given array.
Notes
-----
Transposing a DataFrame with mixed dtypes will result in a homogeneous
DataFrame with the `object` dtype. In such a case, a copy of the data
is always made.
Examples
--------
**Square DataFrame with homogeneous dtype**
>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
col1 col2
0 1 3
1 2 4
>>> df1_transposed = df1.T # or df1.transpose()
>>> df1_transposed
0 1
col1 1 2
col2 3 4
When the dtype is homogeneous in the original DataFrame, we get a
transposed DataFrame with the same dtype:
>>> df1.dtypes
col1 int64
col2 int64
dtype: object
>>> df1_transposed.dtypes
0 int64
1 int64
dtype: object
**Non-square DataFrame with mixed dtypes**
>>> d2 = {'name': ['Alice', 'Bob'],
... 'score': [9.5, 8],
... 'employed': [False, True],
... 'kids': [0, 0]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
name score employed kids
0 Alice 9.5 False 0
1 Bob 8.0 True 0
>>> df2_transposed = df2.T # or df2.transpose()
>>> df2_transposed
0 1
name Alice Bob
score 9.5 8.0
employed False True
kids 0 0
When the DataFrame has mixed dtypes, we get a transposed DataFrame with
the `object` dtype:
>>> df2.dtypes
name object
score float64
employed bool
kids int64
dtype: object
>>> df2_transposed.dtypes
0 object
1 object
dtype: object
- truediv(self, other, axis='columns', level=None, fill_value=None)
- Get Floating division of dataframe and other, element-wise (binary operator `truediv`).
Equivalent to ``dataframe / other``, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, `rtruediv`.
Among flexible wrappers (`add`, `sub`, `mul`, `div`, `mod`, `pow`) to
arithmetic operators: `+`, `-`, `*`, `/`, `//`, `%`, `**`.
Parameters
----------
other : scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.
axis : {0 or 'index', 1 or 'columns'}
Whether to compare by the index (0 or 'index') or columns
(1 or 'columns'). For Series input, axis to match Series index on.
level : int or label
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : float or None, default None
Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
Returns
-------
DataFrame
Result of the arithmetic operation.
See Also
--------
DataFrame.add : Add DataFrames.
DataFrame.sub : Subtract DataFrames.
DataFrame.mul : Multiply DataFrames.
DataFrame.div : Divide DataFrames (float division).
DataFrame.truediv : Divide DataFrames (float division).
DataFrame.floordiv : Divide DataFrames (integer division).
DataFrame.mod : Calculate modulo (remainder after division).
DataFrame.pow : Calculate exponential power.
Notes
-----
Mismatched indices will be unioned together.
Examples
--------
>>> df = pd.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
Add a scalar with operator version which return the same
results.
>>> df + 1
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
>>> df.add(1)
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
Divide by constant with reverse version.
>>> df.div(10)
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
>>> df.rdiv(10)
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
Subtract a list and Series by axis with operator version.
>>> df - [1, 2]
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub([1, 2], axis='columns')
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
>>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index')
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
Multiply a DataFrame of different shape with operator version.
>>> other = pd.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle 0
triangle 3
rectangle 4
>>> df * other
angles degrees
circle 0 NaN
triangle 9 NaN
rectangle 16 NaN
>>> df.mul(other, fill_value=0)
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
Divide by a MultiIndex by level.
>>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
... 'degrees': [360, 180, 360, 360, 540, 720]},
... index=[['A', 'A', 'A', 'B', 'B', 'B'],
... ['circle', 'triangle', 'rectangle',
... 'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles degrees
A circle 0 360
triangle 3 180
rectangle 4 360
B square 4 360
pentagon 5 540
hexagon 6 720
>>> df.div(df_multindex, level=1, fill_value=0)
angles degrees
A circle NaN 1.0
triangle 1.0 1.0
rectangle 1.0 1.0
B square 0.0 0.0
pentagon 0.0 0.0
hexagon 0.0 0.0
- unstack(self, level: 'Level' = -1, fill_value=None)
- Pivot a level of the (necessarily hierarchical) index labels.
Returns a DataFrame having a new level of column labels whose inner-most level
consists of the pivoted index labels.
If the index is not a MultiIndex, the output will be a Series
(the analogue of stack when the columns are not a MultiIndex).
Parameters
----------
level : int, str, or list of these, default -1 (last level)
Level(s) of index to unstack, can pass level name.
fill_value : int, str or dict
Replace NaN with this value if the unstack produces missing values.
Returns
-------
Series or DataFrame
See Also
--------
DataFrame.pivot : Pivot a table based on column values.
DataFrame.stack : Pivot a level of the column labels (inverse operation
from `unstack`).
Notes
-----
Reference :ref:`the user guide <reshaping.stacking>` for more examples.
Examples
--------
>>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
... ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one a 1.0
b 2.0
two a 3.0
b 4.0
dtype: float64
>>> s.unstack(level=-1)
a b
one 1.0 2.0
two 3.0 4.0
>>> s.unstack(level=0)
one two
a 1.0 3.0
b 2.0 4.0
>>> df = s.unstack(level=0)
>>> df.unstack()
one a 1.0
b 2.0
two a 3.0
b 4.0
dtype: float64
- update(self, other, join: 'str' = 'left', overwrite: 'bool' = True, filter_func=None, errors: 'str' = 'ignore') -> 'None'
- Modify in place using non-NA values from another DataFrame.
Aligns on indices. There is no return value.
Parameters
----------
other : DataFrame, or object coercible into a DataFrame
Should have at least one matching index/column label
with the original DataFrame. If a Series is passed,
its name attribute must be set, and that will be
used as the column name to align with the original DataFrame.
join : {'left'}, default 'left'
Only left join is implemented, keeping the index and columns of the
original object.
overwrite : bool, default True
How to handle non-NA values for overlapping keys:
* True: overwrite original DataFrame's values
with values from `other`.
* False: only update values that are NA in
the original DataFrame.
filter_func : callable(1d-array) -> bool 1d-array, optional
Can choose to replace values other than NA. Return True for values
that should be updated.
errors : {'raise', 'ignore'}, default 'ignore'
If 'raise', will raise a ValueError if the DataFrame and `other`
both contain non-NA data in the same place.
Returns
-------
None : method directly changes calling object
Raises
------
ValueError
* When `errors='raise'` and there's overlapping non-NA data.
* When `errors` is not either `'ignore'` or `'raise'`
NotImplementedError
* If `join != 'left'`
See Also
--------
dict.update : Similar method for dictionaries.
DataFrame.merge : For column(s)-on-column(s) operations.
Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
... 'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
A B
0 1 4
1 2 5
2 3 6
The DataFrame's length does not increase as a result of the update,
only values at matching index/column labels are updated.
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
A B
0 a d
1 b e
2 c f
For Series, its name attribute must be set.
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df
A B
0 a d
1 b y
2 c e
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2])
>>> df.update(new_df)
>>> df
A B
0 a x
1 b d
2 c e
If `other` contains NaNs the corresponding values are not updated
in the original dataframe.
>>> df = pd.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
A B
0 1 4.0
1 2 500.0
2 3 6.0
- value_counts(self, subset: 'Sequence[Hashable] | None' = None, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, dropna: 'bool' = True)
- Return a Series containing counts of unique rows in the DataFrame.
.. versionadded:: 1.1.0
Parameters
----------
subset : list-like, optional
Columns to use when counting unique combinations.
normalize : bool, default False
Return proportions rather than frequencies.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
dropna : bool, default True
Don’t include counts of rows that contain NA values.
.. versionadded:: 1.3.0
Returns
-------
Series
See Also
--------
Series.value_counts: Equivalent method on Series.
Notes
-----
The returned Series will have a MultiIndex with one level per input
column. By default, rows that contain any NA values are omitted from
the result. By default, the resulting Series will be in descending
order so that the first element is the most frequently-occurring row.
Examples
--------
>>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
... 'num_wings': [2, 0, 0, 0]},
... index=['falcon', 'dog', 'cat', 'ant'])
>>> df
num_legs num_wings
falcon 2 2
dog 4 0
cat 4 0
ant 6 0
>>> df.value_counts()
num_legs num_wings
4 0 2
2 2 1
6 0 1
dtype: int64
>>> df.value_counts(sort=False)
num_legs num_wings
2 2 1
4 0 2
6 0 1
dtype: int64
>>> df.value_counts(ascending=True)
num_legs num_wings
2 2 1
6 0 1
4 0 2
dtype: int64
>>> df.value_counts(normalize=True)
num_legs num_wings
4 0 0.50
2 2 0.25
6 0 0.25
dtype: float64
With `dropna` set to `False` we can also count rows with NA values.
>>> df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'],
... 'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']})
>>> df
first_name middle_name
0 John Smith
1 Anne <NA>
2 John <NA>
3 Beth Louise
>>> df.value_counts()
first_name middle_name
Beth Louise 1
John Smith 1
dtype: int64
>>> df.value_counts(dropna=False)
first_name middle_name
Anne NaN 1
Beth Louise 1
John Smith 1
NaN 1
dtype: int64
- var(self, axis=None, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)
- Return unbiased variance over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
Parameters
----------
axis : {index (0), columns (1)}
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series.
ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Returns
-------
Series or DataFrame (if level specified)
Examples
--------
>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
... 'age': [21, 25, 62, 43],
... 'height': [1.61, 1.87, 1.49, 2.01]}
... ).set_index('person_id')
>>> df
age height
person_id
0 21 1.61
1 25 1.87
2 62 1.49
3 43 2.01
>>> df.var()
age 352.916667
height 0.056367
Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:
>>> df.var(ddof=0)
age 264.687500
height 0.042275
- where(self, cond, other=<no_default>, inplace=False, axis=None, level=None, errors='raise', try_cast=<no_default>)
- Replace values where the condition is False.
Parameters
----------
cond : bool Series/DataFrame, array-like, or callable
Where `cond` is True, keep the original value. Where
False, replace with corresponding value from `other`.
If `cond` is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn't check it).
other : scalar, Series/DataFrame, or callable
Entries where `cond` is False are replaced with
corresponding value from `other`.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn't check it).
inplace : bool, default False
Whether to perform the operation in place on the data.
axis : int, default None
Alignment axis if needed.
level : int, default None
Alignment level if needed.
errors : str, {'raise', 'ignore'}, default 'raise'
Note that currently this parameter won't affect
the results and will always coerce to a suitable dtype.
- 'raise' : allow exceptions to be raised.
- 'ignore' : suppress exceptions. On error return original object.
try_cast : bool, default None
Try to cast the result back to the input type (if possible).
.. deprecated:: 1.3.0
Manually cast back if necessary.
Returns
-------
Same type as caller or None if ``inplace=True``.
See Also
--------
:func:`DataFrame.mask` : Return an object of same shape as
self.
Notes
-----
The where method is an application of the if-then idiom. For each
element in the calling DataFrame, if ``cond`` is ``True`` the
element is used; otherwise the corresponding element from the DataFrame
``other`` is used.
The signature for :func:`DataFrame.where` differs from
:func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
``np.where(m, df1, df2)``.
For further details and examples see the ``where`` documentation in
:ref:`indexing <indexing.where_mask>`.
Examples
--------
>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
>>> s.mask(s > 0)
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
>>> s.where(s > 1, 10)
0 10
1 10
2 2
3 3
4 4
dtype: int64
>>> s.mask(s > 1, 10)
0 0
1 1
2 10
3 10
4 10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> m = df % 3 == 0
>>> df.where(m, -df)
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == np.where(m, df, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
Class methods defined here:
- from_dict(data, orient: 'str' = 'columns', dtype: 'Dtype | None' = None, columns=None) -> 'DataFrame' from builtins.type
- Construct DataFrame from dict of array-like or dicts.
Creates DataFrame object from dictionary by columns or by index
allowing dtype specification.
Parameters
----------
data : dict
Of the form {field : array-like} or {field : dict}.
orient : {'columns', 'index', 'tight'}, default 'columns'
The "orientation" of the data. If the keys of the passed dict
should be the columns of the resulting DataFrame, pass 'columns'
(default). Otherwise if the keys should be rows, pass 'index'.
If 'tight', assume a dict with keys ['index', 'columns', 'data',
'index_names', 'column_names'].
.. versionadded:: 1.4.0
'tight' as an allowed value for the ``orient`` argument
dtype : dtype, default None
Data type to force, otherwise infer.
columns : list, default None
Column labels to use when ``orient='index'``. Raises a ValueError
if used with ``orient='columns'`` or ``orient='tight'``.
Returns
-------
DataFrame
See Also
--------
DataFrame.from_records : DataFrame from structured ndarray, sequence
of tuples or dicts, or DataFrame.
DataFrame : DataFrame object creation using constructor.
DataFrame.to_dict : Convert the DataFrame to a dictionary.
Examples
--------
By default the keys of the dict become the DataFrame columns:
>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
Specify ``orient='index'`` to create the DataFrame using dictionary
keys as rows:
>>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data, orient='index')
0 1 2 3
row_1 3 2 1 0
row_2 a b c d
When using the 'index' orientation, the column names can be
specified manually:
>>> pd.DataFrame.from_dict(data, orient='index',
... columns=['A', 'B', 'C', 'D'])
A B C D
row_1 3 2 1 0
row_2 a b c d
Specify ``orient='tight'`` to create the DataFrame using a 'tight'
format:
>>> data = {'index': [('a', 'b'), ('a', 'c')],
... 'columns': [('x', 1), ('y', 2)],
... 'data': [[1, 3], [2, 4]],
... 'index_names': ['n1', 'n2'],
... 'column_names': ['z1', 'z2']}
>>> pd.DataFrame.from_dict(data, orient='tight')
z1 x y
z2 1 2
n1 n2
a b 1 3
c 2 4
- from_records(data, index=None, exclude=None, columns=None, coerce_float: 'bool' = False, nrows: 'int | None' = None) -> 'DataFrame' from builtins.type
- Convert structured or record ndarray to DataFrame.
Creates a DataFrame object from a structured ndarray, sequence of
tuples or dicts, or DataFrame.
Parameters
----------
data : structured ndarray, sequence of tuples or dicts, or DataFrame
Structured input data.
index : str, list of fields, array-like
Field of array to use as the index, alternately a specific set of
input labels to use.
exclude : sequence, default None
Columns or fields to exclude.
columns : sequence, default None
Column names to use. If the passed data do not have names
associated with them, this argument provides names for the
columns. Otherwise this argument indicates the order of the columns
in the result (any names not found in the data will become all-NA
columns).
coerce_float : bool, default False
Attempt to convert values of non-string, non-numeric objects (like
decimal.Decimal) to floating point, useful for SQL result sets.
nrows : int, default None
Number of rows to read if data is an iterator.
Returns
-------
DataFrame
See Also
--------
DataFrame.from_dict : DataFrame from dict of array-like or dicts.
DataFrame : DataFrame object creation using constructor.
Examples
--------
Data can be provided as a structured ndarray:
>>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')],
... dtype=[('col_1', 'i4'), ('col_2', 'U1')])
>>> pd.DataFrame.from_records(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
Data can be provided as a list of dicts:
>>> data = [{'col_1': 3, 'col_2': 'a'},
... {'col_1': 2, 'col_2': 'b'},
... {'col_1': 1, 'col_2': 'c'},
... {'col_1': 0, 'col_2': 'd'}]
>>> pd.DataFrame.from_records(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
Data can be provided as a list of tuples with corresponding columns:
>>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
>>> pd.DataFrame.from_records(data, columns=['col_1', 'col_2'])
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
Readonly properties defined here:
- T
- axes
- Return a list representing the axes of the DataFrame.
It has the row axis labels and column axis labels as the only members.
They are returned in that order.
Examples
--------
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.axes
[RangeIndex(start=0, stop=2, step=1), Index(['col1', 'col2'],
dtype='object')]
- shape
- Return a tuple representing the dimensionality of the DataFrame.
See Also
--------
ndarray.shape : Tuple of array dimensions.
Examples
--------
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.shape
(2, 2)
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4],
... 'col3': [5, 6]})
>>> df.shape
(2, 3)
- style
- Returns a Styler object.
Contains methods for building a styled HTML representation of the DataFrame.
See Also
--------
io.formats.style.Styler : Helps style a DataFrame or Series according to the
data with HTML and CSS.
- values
- Return a Numpy representation of the DataFrame.
.. warning::
We recommend using :meth:`DataFrame.to_numpy` instead.
Only the values in the DataFrame will be returned, the axes labels
will be removed.
Returns
-------
numpy.ndarray
The values of the DataFrame.
See Also
--------
DataFrame.to_numpy : Recommended alternative to this method.
DataFrame.index : Retrieve the index labels.
DataFrame.columns : Retrieving the column names.
Notes
-----
The dtype will be a lower-common-denominator dtype (implicit
upcasting); that is to say if the dtypes (even of numeric types)
are mixed, the one that accommodates all will be chosen. Use this
with care if you are not dealing with the blocks.
e.g. If the dtypes are float16 and float32, dtype will be upcast to
float32. If dtypes are int32 and uint8, dtype will be upcast to
int32. By :func:`numpy.find_common_type` convention, mixing int64
and uint64 will result in a float64 dtype.
Examples
--------
A DataFrame where all columns are the same type (e.g., int64) results
in an array of the same type.
>>> df = pd.DataFrame({'age': [ 3, 29],
... 'height': [94, 170],
... 'weight': [31, 115]})
>>> df
age height weight
0 3 94 31
1 29 170 115
>>> df.dtypes
age int64
height int64
weight int64
dtype: object
>>> df.values
array([[ 3, 94, 31],
[ 29, 170, 115]])
A DataFrame with mixed type columns(e.g., str/object, int64, float32)
results in an ndarray of the broadest type that accommodates these
mixed types (e.g., object).
>>> df2 = pd.DataFrame([('parrot', 24.0, 'second'),
... ('lion', 80.5, 1),
... ('monkey', np.nan, None)],
... columns=('name', 'max_speed', 'rank'))
>>> df2.dtypes
name object
max_speed float64
rank object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
['lion', 80.5, 1],
['monkey', nan, None]], dtype=object)
Data descriptors defined here:
- columns
- The column labels of the DataFrame.
- index
- The index (row labels) of the DataFrame.
Data and other attributes defined here:
- __annotations__ = {'_AXIS_TO_AXIS_NUMBER': 'dict[Axis, int]', '_accessors': 'set[str]', '_constructor_sliced': 'Callable[..., Series]', '_hidden_attrs': 'frozenset[str]', '_mgr': 'BlockManager | ArrayManager', 'columns': 'Index', 'index': 'Index'}
- plot = <class 'pandas.plotting._core.PlotAccessor'>
- Make plots of Series or DataFrame.
Uses the backend specified by the
option ``plotting.backend``. By default, matplotlib is used.
Parameters
----------
data : Series or DataFrame
The object for which the method is called.
x : label or position, default None
Only used if data is a DataFrame.
y : label, position or list of label, positions, default None
Allows plotting of one column versus another. Only used if data is a
DataFrame.
kind : str
The kind of plot to produce:
- 'line' : line plot (default)
- 'bar' : vertical bar plot
- 'barh' : horizontal bar plot
- 'hist' : histogram
- 'box' : boxplot
- 'kde' : Kernel Density Estimation plot
- 'density' : same as 'kde'
- 'area' : area plot
- 'pie' : pie plot
- 'scatter' : scatter plot (DataFrame only)
- 'hexbin' : hexbin plot (DataFrame only)
ax : matplotlib axes object, default None
An axes of the current figure.
subplots : bool, default False
Make separate subplots for each column.
sharex : bool, default True if ax is None else False
In case ``subplots=True``, share x axis and set some x axis labels
to invisible; defaults to True if ax is None otherwise False if
an ax is passed in; Be aware, that passing in both an ax and
``sharex=True`` will alter all x axis labels for all axis in a figure.
sharey : bool, default False
In case ``subplots=True``, share y axis and set some y axis labels to invisible.
layout : tuple, optional
(rows, columns) for the layout of subplots.
figsize : a tuple (width, height) in inches
Size of a figure object.
use_index : bool, default True
Use index as ticks for x axis.
title : str or list
Title to use for the plot. If a string is passed, print the string
at the top of the figure. If a list is passed and `subplots` is
True, print each item in the list above the corresponding subplot.
grid : bool, default None (matlab style default)
Axis grid lines.
legend : bool or {'reverse'}
Place legend on axis subplots.
style : list or dict
The matplotlib line style per column.
logx : bool or 'sym', default False
Use log scaling or symlog scaling on x axis.
.. versionchanged:: 0.25.0
logy : bool or 'sym' default False
Use log scaling or symlog scaling on y axis.
.. versionchanged:: 0.25.0
loglog : bool or 'sym', default False
Use log scaling or symlog scaling on both x and y axes.
.. versionchanged:: 0.25.0
xticks : sequence
Values to use for the xticks.
yticks : sequence
Values to use for the yticks.
xlim : 2-tuple/list
Set the x limits of the current axes.
ylim : 2-tuple/list
Set the y limits of the current axes.
xlabel : label, optional
Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the
x-column name for planar plots.
.. versionadded:: 1.1.0
.. versionchanged:: 1.2.0
Now applicable to planar plots (`scatter`, `hexbin`).
ylabel : label, optional
Name to use for the ylabel on y-axis. Default will show no ylabel, or the
y-column name for planar plots.
.. versionadded:: 1.1.0
.. versionchanged:: 1.2.0
Now applicable to planar plots (`scatter`, `hexbin`).
rot : int, default None
Rotation for ticks (xticks for vertical, yticks for horizontal
plots).
fontsize : int, default None
Font size for xticks and yticks.
colormap : str or matplotlib colormap object, default None
Colormap to select colors from. If string, load colormap with that
name from matplotlib.
colorbar : bool, optional
If True, plot colorbar (only relevant for 'scatter' and 'hexbin'
plots).
position : float
Specify relative alignments for bar plot layout.
From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
(center).
table : bool, Series or DataFrame, default False
If True, draw a table using the data in the DataFrame and the data
will be transposed to meet matplotlib's default layout.
If a Series or DataFrame is passed, use passed data to draw a
table.
yerr : DataFrame, Series, array-like, dict and str
See :ref:`Plotting with Error Bars <visualization.errorbars>` for
detail.
xerr : DataFrame, Series, array-like, dict and str
Equivalent to yerr.
stacked : bool, default False in line and bar plots, and True in area plot
If True, create stacked plot.
sort_columns : bool, default False
Sort column names to determine plot ordering.
secondary_y : bool or sequence, default False
Whether to plot on the secondary y-axis if a list/tuple, which
columns to plot on secondary y-axis.
mark_right : bool, default True
When using a secondary_y axis, automatically mark the column
labels with "(right)" in the legend.
include_bool : bool, default is False
If True, boolean values can be plotted.
backend : str, default None
Backend to use instead of the backend specified in the option
``plotting.backend``. For instance, 'matplotlib'. Alternatively, to
specify the ``plotting.backend`` for the whole session, set
``pd.options.plotting.backend``.
.. versionadded:: 1.0.0
**kwargs
Options to pass to matplotlib plotting method.
Returns
-------
:class:`matplotlib.axes.Axes` or numpy.ndarray of them
If the backend is not the default matplotlib one, the return value
will be the object returned by the backend.
Notes
-----
- See matplotlib documentation online for more on this subject
- If `kind` = 'bar' or 'barh', you can specify relative alignments
for bar plot layout by `position` keyword.
From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
(center)
- sparse = <class 'pandas.core.arrays.sparse.accessor.SparseFrameAccessor'>
- DataFrame accessor for sparse data.
.. versionadded:: 0.25.0
Methods inherited from pandas.core.generic.NDFrame:
- __abs__(self: 'NDFrameT') -> 'NDFrameT'
- __array__(self, dtype: 'npt.DTypeLike | None' = None) -> 'np.ndarray'
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str', *inputs: 'Any', **kwargs: 'Any')
- __array_wrap__(self, result: 'np.ndarray', context: 'tuple[Callable, tuple[Any, ...], int] | None' = None)
- Gets called after a ufunc and other functions.
Parameters
----------
result: np.ndarray
The result of the ufunc or other function called on the NumPy array
returned by __array__
context: tuple of (func, tuple, int)
This parameter is returned by ufuncs as a 3-element tuple: (name of the
ufunc, arguments of the ufunc, domain of the ufunc), but is not set by
other numpy functions.q
Notes
-----
Series implements __array_ufunc_ so this not called for ufunc on Series.
- __bool__ = __nonzero__(self)
- __contains__(self, key) -> 'bool_t'
- True if the key is in the info axis
- __copy__(self: 'NDFrameT', deep: 'bool_t' = True) -> 'NDFrameT'
- __deepcopy__(self: 'NDFrameT', memo=None) -> 'NDFrameT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __delitem__(self, key) -> 'None'
- Delete item
- __finalize__(self: 'NDFrameT', other, method: 'str | None' = None, **kwargs) -> 'NDFrameT'
- Propagate metadata from other to self.
Parameters
----------
other : the object from which to get the attributes that we are going
to propagate
method : str, optional
A passed method name providing context on where ``__finalize__``
was called.
.. warning::
The value passed as `method` are not currently considered
stable across pandas releases.
- __getattr__(self, name: 'str')
- After regular attribute access, try looking up the name
This allows simpler access to columns for interactive use.
- __getstate__(self) -> 'dict[str, Any]'
- __iadd__(self, other)
- __iand__(self, other)
- __ifloordiv__(self, other)
- __imod__(self, other)
- __imul__(self, other)
- __invert__(self)
- __ior__(self, other)
- __ipow__(self, other)
- __isub__(self, other)
- __iter__(self)
- Iterate over info axis.
Returns
-------
iterator
Info axis as iterator.
- __itruediv__(self, other)
- __ixor__(self, other)
- __neg__(self)
- __nonzero__(self)
- __pos__(self)
- __round__(self: 'NDFrameT', decimals: 'int' = 0) -> 'NDFrameT'
- __setattr__(self, name: 'str', value) -> 'None'
- After regular attribute access, try setting the name
This allows simpler access to columns for interactive use.
- __setstate__(self, state)
- abs(self: 'NDFrameT') -> 'NDFrameT'
- Return a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
Returns
-------
abs
Series/DataFrame containing the absolute value of each element.
See Also
--------
numpy.absolute : Calculate the absolute value element-wise.
Notes
-----
For ``complex`` inputs, ``1.2 + 1j``, the absolute value is
:math:`\sqrt{ a^2 + b^2 }`.
Examples
--------
Absolute numeric values in a Series.
>>> s = pd.Series([-1.10, 2, -3.33, 4])
>>> s.abs()
0 1.10
1 2.00
2 3.33
3 4.00
dtype: float64
Absolute numeric values in a Series with complex numbers.
>>> s = pd.Series([1.2 + 1j])
>>> s.abs()
0 1.56205
dtype: float64
Absolute numeric values in a Series with a Timedelta element.
>>> s = pd.Series([pd.Timedelta('1 days')])
>>> s.abs()
0 1 days
dtype: timedelta64[ns]
Select rows with data closest to certain value using argsort (from
`StackOverflow <https://stackoverflow.com/a/17758115>`__).
>>> df = pd.DataFrame({
... 'a': [4, 5, 6, 7],
... 'b': [10, 20, 30, 40],
... 'c': [100, 50, -30, -50]
... })
>>> df
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
>>> df.loc[(df.c - 43).abs().argsort()]
a b c
1 5 20 50
0 4 10 100
2 6 30 -30
3 7 40 -50
- add_prefix(self: 'NDFrameT', prefix: 'str') -> 'NDFrameT'
- Prefix labels with string `prefix`.
For Series, the row labels are prefixed.
For DataFrame, the column labels are prefixed.
Parameters
----------
prefix : str
The string to add before each label.
Returns
-------
Series or DataFrame
New Series or DataFrame with updated labels.
See Also
--------
Series.add_suffix: Suffix row labels with string `suffix`.
DataFrame.add_suffix: Suffix column labels with string `suffix`.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4])
>>> s
0 1
1 2
2 3
3 4
dtype: int64
>>> s.add_prefix('item_')
item_0 1
item_1 2
item_2 3
item_3 4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
A B
0 1 3
1 2 4
2 3 5
3 4 6
>>> df.add_prefix('col_')
col_A col_B
0 1 3
1 2 4
2 3 5
3 4 6
- add_suffix(self: 'NDFrameT', suffix: 'str') -> 'NDFrameT'
- Suffix labels with string `suffix`.
For Series, the row labels are suffixed.
For DataFrame, the column labels are suffixed.
Parameters
----------
suffix : str
The string to add after each label.
Returns
-------
Series or DataFrame
New Series or DataFrame with updated labels.
See Also
--------
Series.add_prefix: Prefix row labels with string `prefix`.
DataFrame.add_prefix: Prefix column labels with string `prefix`.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4])
>>> s
0 1
1 2
2 3
3 4
dtype: int64
>>> s.add_suffix('_item')
0_item 1
1_item 2
2_item 3
3_item 4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
A B
0 1 3
1 2 4
2 3 5
3 4 6
>>> df.add_suffix('_col')
A_col B_col
0 1 3
1 2 4
2 3 5
3 4 6
- asof(self, where, subset=None)
- Return the last row(s) without any NaNs before `where`.
The last row (for each element in `where`, if list) without any
NaN is taken.
In case of a :class:`~pandas.DataFrame`, the last row without NaN
considering only the subset of columns (if not `None`)
If there is no good value, NaN is returned for a Series or
a Series of NaN values for a DataFrame
Parameters
----------
where : date or array-like of dates
Date(s) before which the last row(s) are returned.
subset : str or array-like of str, default `None`
For DataFrame, if not `None`, only use these columns to
check for NaNs.
Returns
-------
scalar, Series, or DataFrame
The return can be:
* scalar : when `self` is a Series and `where` is a scalar
* Series: when `self` is a Series and `where` is an array-like,
or when `self` is a DataFrame and `where` is a scalar
* DataFrame : when `self` is a DataFrame and `where` is an
array-like
Return scalar, Series, or DataFrame.
See Also
--------
merge_asof : Perform an asof merge. Similar to left join.
Notes
-----
Dates are assumed to be sorted. Raises if this is not the case.
Examples
--------
A Series and a scalar `where`.
>>> s = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40])
>>> s
10 1.0
20 2.0
30 NaN
40 4.0
dtype: float64
>>> s.asof(20)
2.0
For a sequence `where`, a Series is returned. The first value is
NaN, because the first element of `where` is before the first
index value.
>>> s.asof([5, 20])
5 NaN
20 2.0
dtype: float64
Missing values are not considered. The following is ``2.0``, not
NaN, even though NaN is at the index location for ``30``.
>>> s.asof(30)
2.0
Take all columns into consideration
>>> df = pd.DataFrame({'a': [10, 20, 30, 40, 50],
... 'b': [None, None, None, None, 500]},
... index=pd.DatetimeIndex(['2018-02-27 09:01:00',
... '2018-02-27 09:02:00',
... '2018-02-27 09:03:00',
... '2018-02-27 09:04:00',
... '2018-02-27 09:05:00']))
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
... '2018-02-27 09:04:30']))
a b
2018-02-27 09:03:30 NaN NaN
2018-02-27 09:04:30 NaN NaN
Take a single column into consideration
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
... '2018-02-27 09:04:30']),
... subset=['a'])
a b
2018-02-27 09:03:30 30.0 NaN
2018-02-27 09:04:30 40.0 NaN
- astype(self: 'NDFrameT', dtype, copy: 'bool_t' = True, errors: 'str' = 'raise') -> 'NDFrameT'
- Cast a pandas object to a specified dtype ``dtype``.
Parameters
----------
dtype : data type, or dict of column name -> data type
Use a numpy.dtype or Python type to cast entire pandas object to
the same type. Alternatively, use {col: dtype, ...}, where col is a
column label and dtype is a numpy.dtype or Python type to cast one
or more of the DataFrame's columns to column-specific types.
copy : bool, default True
Return a copy when ``copy=True`` (be very careful setting
``copy=False`` as changes to values then may propagate to other
pandas objects).
errors : {'raise', 'ignore'}, default 'raise'
Control raising of exceptions on invalid data for provided dtype.
- ``raise`` : allow exceptions to be raised
- ``ignore`` : suppress exceptions. On error return original object.
Returns
-------
casted : same type as caller
See Also
--------
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
to_numeric : Convert argument to a numeric type.
numpy.ndarray.astype : Cast a numpy array to a specified type.
Notes
-----
.. deprecated:: 1.3.0
Using ``astype`` to convert from timezone-naive dtype to
timezone-aware dtype is deprecated and will raise in a
future version. Use :meth:`Series.dt.tz_localize` instead.
Examples
--------
Create a DataFrame:
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1 int64
col2 int64
dtype: object
Cast all columns to int32:
>>> df.astype('int32').dtypes
col1 int32
col2 int32
dtype: object
Cast col1 to int32 using a dictionary:
>>> df.astype({'col1': 'int32'}).dtypes
col1 int32
col2 int64
dtype: object
Create a series:
>>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0 1
1 2
dtype: int32
>>> ser.astype('int64')
0 1
1 2
dtype: int64
Convert to categorical type:
>>> ser.astype('category')
0 1
1 2
dtype: category
Categories (2, int64): [1, 2]
Convert to ordered categorical type with custom ordering:
>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(
... categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0 1
1 2
dtype: category
Categories (2, int64): [2 < 1]
Note that using ``copy=False`` and changing data on a new
pandas object may propagate changes:
>>> s1 = pd.Series([1, 2])
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1 # note that s1[0] has changed too
0 10
1 2
dtype: int64
Create a series of dates:
>>> ser_date = pd.Series(pd.date_range('20200101', periods=3))
>>> ser_date
0 2020-01-01
1 2020-01-02
2 2020-01-03
dtype: datetime64[ns]
- at_time(self: 'NDFrameT', time, asof: 'bool_t' = False, axis=None) -> 'NDFrameT'
- Select values at particular time of day (e.g., 9:30AM).
Parameters
----------
time : datetime.time or str
axis : {0 or 'index', 1 or 'columns'}, default 0
Returns
-------
Series or DataFrame
Raises
------
TypeError
If the index is not a :class:`DatetimeIndex`
See Also
--------
between_time : Select values between particular times of the day.
first : Select initial periods of time series based on a date offset.
last : Select final periods of time series based on a date offset.
DatetimeIndex.indexer_at_time : Get just the index locations for
values at particular time of the day.
Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='12H')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 00:00:00 1
2018-04-09 12:00:00 2
2018-04-10 00:00:00 3
2018-04-10 12:00:00 4
>>> ts.at_time('12:00')
A
2018-04-09 12:00:00 2
2018-04-10 12:00:00 4
- backfill = bfill(self: 'NDFrameT', axis: 'None | Axis' = None, inplace: 'bool_t' = False, limit: 'None | int' = None, downcast=None) -> 'NDFrameT | None'
- Synonym for :meth:`DataFrame.fillna` with ``method='bfill'``.
Returns
-------
Series/DataFrame or None
Object with missing values filled or None if ``inplace=True``.
- between_time(self: 'NDFrameT', start_time, end_time, include_start: 'bool_t | lib.NoDefault' = <no_default>, include_end: 'bool_t | lib.NoDefault' = <no_default>, inclusive: 'str | None' = None, axis=None) -> 'NDFrameT'
- Select values between particular times of the day (e.g., 9:00-9:30 AM).
By setting ``start_time`` to be later than ``end_time``,
you can get the times that are *not* between the two times.
Parameters
----------
start_time : datetime.time or str
Initial time as a time filter limit.
end_time : datetime.time or str
End time as a time filter limit.
include_start : bool, default True
Whether the start time needs to be included in the result.
.. deprecated:: 1.4.0
Arguments `include_start` and `include_end` have been deprecated
to standardize boundary inputs. Use `inclusive` instead, to set
each bound as closed or open.
include_end : bool, default True
Whether the end time needs to be included in the result.
.. deprecated:: 1.4.0
Arguments `include_start` and `include_end` have been deprecated
to standardize boundary inputs. Use `inclusive` instead, to set
each bound as closed or open.
inclusive : {"both", "neither", "left", "right"}, default "both"
Include boundaries; whether to set each bound as closed or open.
axis : {0 or 'index', 1 or 'columns'}, default 0
Determine range time on index or columns value.
Returns
-------
Series or DataFrame
Data from the original object filtered to the specified dates range.
Raises
------
TypeError
If the index is not a :class:`DatetimeIndex`
See Also
--------
at_time : Select values at a particular time of the day.
first : Select initial periods of time series based on a date offset.
last : Select final periods of time series based on a date offset.
DatetimeIndex.indexer_between_time : Get just the index locations for
values between particular times of the day.
Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 00:00:00 1
2018-04-10 00:20:00 2
2018-04-11 00:40:00 3
2018-04-12 01:00:00 4
>>> ts.between_time('0:15', '0:45')
A
2018-04-10 00:20:00 2
2018-04-11 00:40:00 3
You get the times that are *not* between two times by setting
``start_time`` later than ``end_time``:
>>> ts.between_time('0:45', '0:15')
A
2018-04-09 00:00:00 1
2018-04-12 01:00:00 4
- bool(self)
- Return the bool of a single element Series or DataFrame.
This must be a boolean scalar value, either True or False. It will raise a
ValueError if the Series or DataFrame does not have exactly 1 element, or that
element is not boolean (integer values 0 and 1 will also raise an exception).
Returns
-------
bool
The value in the Series or DataFrame.
See Also
--------
Series.astype : Change the data type of a Series, including to boolean.
DataFrame.astype : Change the data type of a DataFrame, including to boolean.
numpy.bool_ : NumPy boolean data type, used by pandas for boolean values.
Examples
--------
The method will only work for single element objects with a boolean value:
>>> pd.Series([True]).bool()
True
>>> pd.Series([False]).bool()
False
>>> pd.DataFrame({'col': [True]}).bool()
True
>>> pd.DataFrame({'col': [False]}).bool()
False
- convert_dtypes(self: 'NDFrameT', infer_objects: 'bool_t' = True, convert_string: 'bool_t' = True, convert_integer: 'bool_t' = True, convert_boolean: 'bool_t' = True, convert_floating: 'bool_t' = True) -> 'NDFrameT'
- Convert columns to best possible dtypes using dtypes supporting ``pd.NA``.
.. versionadded:: 1.0.0
Parameters
----------
infer_objects : bool, default True
Whether object dtypes should be converted to the best possible types.
convert_string : bool, default True
Whether object dtypes should be converted to ``StringDtype()``.
convert_integer : bool, default True
Whether, if possible, conversion can be done to integer extension types.
convert_boolean : bool, defaults True
Whether object dtypes should be converted to ``BooleanDtypes()``.
convert_floating : bool, defaults True
Whether, if possible, conversion can be done to floating extension types.
If `convert_integer` is also True, preference will be give to integer
dtypes if the floats can be faithfully casted to integers.
.. versionadded:: 1.2.0
Returns
-------
Series or DataFrame
Copy of input object with new dtype.
See Also
--------
infer_objects : Infer dtypes of objects.
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
to_numeric : Convert argument to a numeric type.
Notes
-----
By default, ``convert_dtypes`` will attempt to convert a Series (or each
Series in a DataFrame) to dtypes that support ``pd.NA``. By using the options
``convert_string``, ``convert_integer``, ``convert_boolean`` and
``convert_boolean``, it is possible to turn off individual conversions
to ``StringDtype``, the integer extension types, ``BooleanDtype``
or floating extension types, respectively.
For object-dtyped columns, if ``infer_objects`` is ``True``, use the inference
rules as during normal Series/DataFrame construction. Then, if possible,
convert to ``StringDtype``, ``BooleanDtype`` or an appropriate integer
or floating extension type, otherwise leave as ``object``.
If the dtype is integer, convert to an appropriate integer extension type.
If the dtype is numeric, and consists of all integers, convert to an
appropriate integer extension type. Otherwise, convert to an
appropriate floating extension type.
.. versionchanged:: 1.2
Starting with pandas 1.2, this method also converts float columns
to the nullable floating extension type.
In the future, as new dtypes are added that support ``pd.NA``, the results
of this method will change to support those new dtypes.
Examples
--------
>>> df = pd.DataFrame(
... {
... "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
... "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
... "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
... "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
... "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
... "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
... }
... )
Start with a DataFrame with default dtypes.
>>> df
a b c d e f
0 1 x True h 10.0 NaN
1 2 y False i NaN 100.5
2 3 z NaN NaN 20.0 200.0
>>> df.dtypes
a int32
b object
c object
d object
e float64
f float64
dtype: object
Convert the DataFrame to use best possible dtypes.
>>> dfn = df.convert_dtypes()
>>> dfn
a b c d e f
0 1 x True h 10 <NA>
1 2 y False i <NA> 100.5
2 3 z <NA> <NA> 20 200.0
>>> dfn.dtypes
a Int32
b string
c boolean
d string
e Int64
f Float64
dtype: object
Start with a Series of strings and missing data represented by ``np.nan``.
>>> s = pd.Series(["a", "b", np.nan])
>>> s
0 a
1 b
2 NaN
dtype: object
Obtain a Series with dtype ``StringDtype``.
>>> s.convert_dtypes()
0 a
1 b
2 <NA>
dtype: string
- copy(self: 'NDFrameT', deep: 'bool_t' = True) -> 'NDFrameT'
- Make a copy of this object's indices and data.
When ``deep=True`` (default), a new object will be created with a
copy of the calling object's data and indices. Modifications to
the data or indices of the copy will not be reflected in the
original object (see notes below).
When ``deep=False``, a new object will be created without copying
the calling object's data or index (only references to the data
and index are copied). Any changes to the data of the original
will be reflected in the shallow copy (and vice versa).
Parameters
----------
deep : bool, default True
Make a deep copy, including a copy of the data and the indices.
With ``deep=False`` neither the indices nor the data are copied.
Returns
-------
copy : Series or DataFrame
Object type matches caller.
Notes
-----
When ``deep=True``, data is copied but actual Python objects
will not be copied recursively, only the reference to the object.
This is in contrast to `copy.deepcopy` in the Standard Library,
which recursively copies object data (see examples below).
While ``Index`` objects are copied when ``deep=True``, the underlying
numpy array is not copied for performance reasons. Since ``Index`` is
immutable, the underlying data can be safely shared and a copy
is not needed.
Examples
--------
>>> s = pd.Series([1, 2], index=["a", "b"])
>>> s
a 1
b 2
dtype: int64
>>> s_copy = s.copy()
>>> s_copy
a 1
b 2
dtype: int64
**Shallow copy versus default (deep) copy:**
>>> s = pd.Series([1, 2], index=["a", "b"])
>>> deep = s.copy()
>>> shallow = s.copy(deep=False)
Shallow copy shares data and index with original.
>>> s is shallow
False
>>> s.values is shallow.values and s.index is shallow.index
True
Deep copy has own copy of data and index.
>>> s is deep
False
>>> s.values is deep.values or s.index is deep.index
False
Updates to the data shared by shallow copy and original is reflected
in both; deep copy remains unchanged.
>>> s[0] = 3
>>> shallow[1] = 4
>>> s
a 3
b 4
dtype: int64
>>> shallow
a 3
b 4
dtype: int64
>>> deep
a 1
b 2
dtype: int64
Note that when copying an object containing Python objects, a deep copy
will copy the data, but will not do so recursively. Updating a nested
data object will be reflected in the deep copy.
>>> s = pd.Series([[1, 2], [3, 4]])
>>> deep = s.copy()
>>> s[0][0] = 10
>>> s
0 [10, 2]
1 [3, 4]
dtype: object
>>> deep
0 [10, 2]
1 [3, 4]
dtype: object
- describe(self: 'NDFrameT', percentiles=None, include=None, exclude=None, datetime_is_numeric=False) -> 'NDFrameT'
- Generate descriptive statistics.
Descriptive statistics include those that summarize the central
tendency, dispersion and shape of a
dataset's distribution, excluding ``NaN`` values.
Analyzes both numeric and object series, as well
as ``DataFrame`` column sets of mixed data types. The output
will vary depending on what is provided. Refer to the notes
below for more detail.
Parameters
----------
percentiles : list-like of numbers, optional
The percentiles to include in the output. All should
fall between 0 and 1. The default is
``[.25, .5, .75]``, which returns the 25th, 50th, and
75th percentiles.
include : 'all', list-like of dtypes or None (default), optional
A white list of data types to include in the result. Ignored
for ``Series``. Here are the options:
- 'all' : All columns of the input will be included in the output.
- A list-like of dtypes : Limits the results to the
provided data types.
To limit the result to numeric types submit
``numpy.number``. To limit it instead to object columns submit
the ``numpy.object`` data type. Strings
can also be used in the style of
``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
select pandas categorical columns, use ``'category'``
- None (default) : The result will include all numeric columns.
exclude : list-like of dtypes or None (default), optional,
A black list of data types to omit from the result. Ignored
for ``Series``. Here are the options:
- A list-like of dtypes : Excludes the provided data types
from the result. To exclude numeric types submit
``numpy.number``. To exclude object columns submit the data
type ``numpy.object``. Strings can also be used in the style of
``select_dtypes`` (e.g. ``df.describe(exclude=['O'])``). To
exclude pandas categorical columns, use ``'category'``
- None (default) : The result will exclude nothing.
datetime_is_numeric : bool, default False
Whether to treat datetime dtypes as numeric. This affects statistics
calculated for the column. For DataFrame input, this also
controls whether datetime columns are included by default.
.. versionadded:: 1.1.0
Returns
-------
Series or DataFrame
Summary statistics of the Series or Dataframe provided.
See Also
--------
DataFrame.count: Count number of non-NA/null observations.
DataFrame.max: Maximum of the values in the object.
DataFrame.min: Minimum of the values in the object.
DataFrame.mean: Mean of the values.
DataFrame.std: Standard deviation of the observations.
DataFrame.select_dtypes: Subset of a DataFrame including/excluding
columns based on their dtype.
Notes
-----
For numeric data, the result's index will include ``count``,
``mean``, ``std``, ``min``, ``max`` as well as lower, ``50`` and
upper percentiles. By default the lower percentile is ``25`` and the
upper percentile is ``75``. The ``50`` percentile is the
same as the median.
For object data (e.g. strings or timestamps), the result's index
will include ``count``, ``unique``, ``top``, and ``freq``. The ``top``
is the most common value. The ``freq`` is the most common value's
frequency. Timestamps also include the ``first`` and ``last`` items.
If multiple object values have the highest count, then the
``count`` and ``top`` results will be arbitrarily chosen from
among those with the highest count.
For mixed data types provided via a ``DataFrame``, the default is to
return only an analysis of numeric columns. If the dataframe consists
only of object and categorical data without any numeric columns, the
default is to return an analysis of both the object and categorical
columns. If ``include='all'`` is provided as an option, the result
will include a union of attributes of each type.
The `include` and `exclude` parameters can be used to limit
which columns in a ``DataFrame`` are analyzed for the output.
The parameters are ignored when analyzing a ``Series``.
Examples
--------
Describing a numeric ``Series``.
>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
Describing a categorical ``Series``.
>>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count 4
unique 3
top a
freq 2
dtype: object
Describing a timestamp ``Series``.
>>> s = pd.Series([
... np.datetime64("2000-01-01"),
... np.datetime64("2010-01-01"),
... np.datetime64("2010-01-01")
... ])
>>> s.describe(datetime_is_numeric=True)
count 3
mean 2006-09-01 08:00:00
min 2000-01-01 00:00:00
25% 2004-12-31 12:00:00
50% 2010-01-01 00:00:00
75% 2010-01-01 00:00:00
max 2010-01-01 00:00:00
dtype: object
Describing a ``DataFrame``. By default only numeric fields
are returned.
>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
... 'numeric': [1, 2, 3],
... 'object': ['a', 'b', 'c']
... })
>>> df.describe()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Describing all columns of a ``DataFrame`` regardless of data type.
>>> df.describe(include='all') # doctest: +SKIP
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top f NaN a
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
min NaN 1.0 NaN
25% NaN 1.5 NaN
50% NaN 2.0 NaN
75% NaN 2.5 NaN
max NaN 3.0 NaN
Describing a column from a ``DataFrame`` by accessing it as
an attribute.
>>> df.numeric.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Name: numeric, dtype: float64
Including only numeric columns in a ``DataFrame`` description.
>>> df.describe(include=[np.number])
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Including only string columns in a ``DataFrame`` description.
>>> df.describe(include=[object]) # doctest: +SKIP
object
count 3
unique 3
top a
freq 1
Including only categorical columns from a ``DataFrame`` description.
>>> df.describe(include=['category'])
categorical
count 3
unique 3
top d
freq 1
Excluding numeric columns from a ``DataFrame`` description.
>>> df.describe(exclude=[np.number]) # doctest: +SKIP
categorical object
count 3 3
unique 3 3
top f a
freq 1 1
Excluding object columns from a ``DataFrame`` description.
>>> df.describe(exclude=[object]) # doctest: +SKIP
categorical numeric
count 3 3.0
unique 3 NaN
top f NaN
freq 1 NaN
mean NaN 2.0
std NaN 1.0
min NaN 1.0
25% NaN 1.5
50% NaN 2.0
75% NaN 2.5
max NaN 3.0
- droplevel(self: 'NDFrameT', level, axis=0) -> 'NDFrameT'
- Return Series/DataFrame with requested index / column level(s) removed.
Parameters
----------
level : int, str, or list-like
If a string is given, must be the name of a level
If list-like, elements must be names or positional indexes
of levels.
axis : {0 or 'index', 1 or 'columns'}, default 0
Axis along which the level(s) is removed:
* 0 or 'index': remove level(s) in column.
* 1 or 'columns': remove level(s) in row.
Returns
-------
Series/DataFrame
Series/DataFrame with requested index / column level(s) removed.
Examples
--------
>>> df = pd.DataFrame([
... [1, 2, 3, 4],
... [5, 6, 7, 8],
... [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])
>>> df.columns = pd.MultiIndex.from_tuples([
... ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])
>>> df
level_1 c d
level_2 e f
a b
1 2 3 4
5 6 7 8
9 10 11 12
>>> df.droplevel('a')
level_1 c d
level_2 e f
b
2 3 4
6 7 8
10 11 12
>>> df.droplevel('level_2', axis=1)
level_1 c d
a b
1 2 3 4
5 6 7 8
9 10 11 12
- equals(self, other: 'object') -> 'bool_t'
- Test whether two objects contain the same elements.
This function allows two Series or DataFrames to be compared against
each other to see if they have the same shape and elements. NaNs in
the same location are considered equal.
The row/column index do not need to have the same type, as long
as the values are considered equal. Corresponding columns must be of
the same dtype.
Parameters
----------
other : Series or DataFrame
The other Series or DataFrame to be compared with the first.
Returns
-------
bool
True if all elements are the same in both objects, False
otherwise.
See Also
--------
Series.eq : Compare two Series objects of the same length
and return a Series where each element is True if the element
in each Series is equal, False otherwise.
DataFrame.eq : Compare two DataFrame objects of the same shape and
return a DataFrame where each element is True if the respective
element in each DataFrame is equal, False otherwise.
testing.assert_series_equal : Raises an AssertionError if left and
right are not equal. Provides an easy interface to ignore
inequality in dtypes, indexes and precision among others.
testing.assert_frame_equal : Like assert_series_equal, but targets
DataFrames.
numpy.array_equal : Return True if two arrays have the same shape
and elements, False otherwise.
Examples
--------
>>> df = pd.DataFrame({1: [10], 2: [20]})
>>> df
1 2
0 10 20
DataFrames df and exactly_equal have the same types and values for
their elements and column labels, which will return True.
>>> exactly_equal = pd.DataFrame({1: [10], 2: [20]})
>>> exactly_equal
1 2
0 10 20
>>> df.equals(exactly_equal)
True
DataFrames df and different_column_type have the same element
types and values, but have different types for the column labels,
which will still return True.
>>> different_column_type = pd.DataFrame({1.0: [10], 2.0: [20]})
>>> different_column_type
1.0 2.0
0 10 20
>>> df.equals(different_column_type)
True
DataFrames df and different_data_type have different types for the
same values for their elements, and will return False even though
their column labels are the same values and types.
>>> different_data_type = pd.DataFrame({1: [10.0], 2: [20.0]})
>>> different_data_type
1 2
0 10.0 20.0
>>> df.equals(different_data_type)
False
- ewm(self, com: 'float | None' = None, span: 'float | None' = None, halflife: 'float | TimedeltaConvertibleTypes | None' = None, alpha: 'float | None' = None, min_periods: 'int | None' = 0, adjust: 'bool_t' = True, ignore_na: 'bool_t' = False, axis: 'Axis' = 0, times: 'str | np.ndarray | DataFrame | Series | None' = None, method: 'str' = 'single') -> 'ExponentialMovingWindow'
- Provide exponentially weighted (EW) calculations.
Exactly one parameter: ``com``, ``span``, ``halflife``, or ``alpha`` must be
provided.
Parameters
----------
com : float, optional
Specify decay in terms of center of mass
:math:`\alpha = 1 / (1 + com)`, for :math:`com \geq 0`.
span : float, optional
Specify decay in terms of span
:math:`\alpha = 2 / (span + 1)`, for :math:`span \geq 1`.
halflife : float, str, timedelta, optional
Specify decay in terms of half-life
:math:`\alpha = 1 - \exp\left(-\ln(2) / halflife\right)`, for
:math:`halflife > 0`.
If ``times`` is specified, the time unit (str or timedelta) over which an
observation decays to half its value. Only applicable to ``mean()``,
and halflife value will not apply to the other functions.
.. versionadded:: 1.1.0
alpha : float, optional
Specify smoothing factor :math:`\alpha` directly
:math:`0 < \alpha \leq 1`.
min_periods : int, default 0
Minimum number of observations in window required to have a value;
otherwise, result is ``np.nan``.
adjust : bool, default True
Divide by decaying adjustment factor in beginning periods to account
for imbalance in relative weightings (viewing EWMA as a moving average).
- When ``adjust=True`` (default), the EW function is calculated using weights
:math:`w_i = (1 - \alpha)^i`. For example, the EW moving average of the series
[:math:`x_0, x_1, ..., x_t`] would be:
.. math::
y_t = \frac{x_t + (1 - \alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ... + (1 -
\alpha)^t x_0}{1 + (1 - \alpha) + (1 - \alpha)^2 + ... + (1 - \alpha)^t}
- When ``adjust=False``, the exponentially weighted function is calculated
recursively:
.. math::
\begin{split}
y_0 &= x_0\\
y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,
\end{split}
ignore_na : bool, default False
Ignore missing values when calculating weights.
- When ``ignore_na=False`` (default), weights are based on absolute positions.
For example, the weights of :math:`x_0` and :math:`x_2` used in calculating
the final weighted average of [:math:`x_0`, None, :math:`x_2`] are
:math:`(1-\alpha)^2` and :math:`1` if ``adjust=True``, and
:math:`(1-\alpha)^2` and :math:`\alpha` if ``adjust=False``.
- When ``ignore_na=True``, weights are based
on relative positions. For example, the weights of :math:`x_0` and :math:`x_2`
used in calculating the final weighted average of
[:math:`x_0`, None, :math:`x_2`] are :math:`1-\alpha` and :math:`1` if
``adjust=True``, and :math:`1-\alpha` and :math:`\alpha` if ``adjust=False``.
axis : {0, 1}, default 0
If ``0`` or ``'index'``, calculate across the rows.
If ``1`` or ``'columns'``, calculate across the columns.
times : str, np.ndarray, Series, default None
.. versionadded:: 1.1.0
Only applicable to ``mean()``.
Times corresponding to the observations. Must be monotonically increasing and
``datetime64[ns]`` dtype.
If 1-D array like, a sequence with the same shape as the observations.
.. deprecated:: 1.4.0
If str, the name of the column in the DataFrame representing the times.
method : str {'single', 'table'}, default 'single'
.. versionadded:: 1.4.0
Execute the rolling operation per single column or row (``'single'``)
or over the entire object (``'table'``).
This argument is only implemented when specifying ``engine='numba'``
in the method call.
Only applicable to ``mean()``
Returns
-------
``ExponentialMovingWindow`` subclass
See Also
--------
rolling : Provides rolling window calculations.
expanding : Provides expanding transformations.
Notes
-----
See :ref:`Windowing Operations <window.exponentially_weighted>`
for further usage details and examples.
Examples
--------
>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
>>> df.ewm(com=0.5).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
>>> df.ewm(alpha=2 / 3).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
**adjust**
>>> df.ewm(com=0.5, adjust=True).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
>>> df.ewm(com=0.5, adjust=False).mean()
B
0 0.000000
1 0.666667
2 1.555556
3 1.555556
4 3.650794
**ignore_na**
>>> df.ewm(com=0.5, ignore_na=True).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.225000
>>> df.ewm(com=0.5, ignore_na=False).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
**times**
Exponentially weighted mean with weights calculated with a timedelta ``halflife``
relative to ``times``.
>>> times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
>>> df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
B
0 0.000000
1 0.585786
2 1.523889
3 1.523889
4 3.233686
- expanding(self, min_periods: 'int' = 1, center: 'bool_t | None' = None, axis: 'Axis' = 0, method: 'str' = 'single') -> 'Expanding'
- Provide expanding window calculations.
Parameters
----------
min_periods : int, default 1
Minimum number of observations in window required to have a value;
otherwise, result is ``np.nan``.
center : bool, default False
If False, set the window labels as the right edge of the window index.
If True, set the window labels as the center of the window index.
.. deprecated:: 1.1.0
axis : int or str, default 0
If ``0`` or ``'index'``, roll across the rows.
If ``1`` or ``'columns'``, roll across the columns.
method : str {'single', 'table'}, default 'single'
Execute the rolling operation per single column or row (``'single'``)
or over the entire object (``'table'``).
This argument is only implemented when specifying ``engine='numba'``
in the method call.
.. versionadded:: 1.3.0
Returns
-------
``Expanding`` subclass
See Also
--------
rolling : Provides rolling window calculations.
ewm : Provides exponential weighted functions.
Notes
-----
See :ref:`Windowing Operations <window.expanding>` for further usage details
and examples.
Examples
--------
>>> df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})
>>> df
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
**min_periods**
Expanding sum with 1 vs 3 observations needed to calculate a value.
>>> df.expanding(1).sum()
B
0 0.0
1 1.0
2 3.0
3 3.0
4 7.0
>>> df.expanding(3).sum()
B
0 NaN
1 NaN
2 3.0
3 3.0
4 7.0
- filter(self: 'NDFrameT', items=None, like: 'str | None' = None, regex: 'str | None' = None, axis=None) -> 'NDFrameT'
- Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its
contents. The filter is applied to the labels of the index.
Parameters
----------
items : list-like
Keep labels from axis which are in items.
like : str
Keep labels from axis for which "like in label == True".
regex : str (regular expression)
Keep labels from axis for which re.search(regex, label) == True.
axis : {0 or ‘index’, 1 or ‘columns’, None}, default None
The axis to filter on, expressed either as an index (int)
or axis name (str). By default this is the info axis,
'index' for Series, 'columns' for DataFrame.
Returns
-------
same type as input object
See Also
--------
DataFrame.loc : Access a group of rows and columns
by label(s) or a boolean array.
Notes
-----
The ``items``, ``like``, and ``regex`` parameters are
enforced to be mutually exclusive.
``axis`` defaults to the info axis that is used when indexing
with ``[]``.
Examples
--------
>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
... index=['mouse', 'rabbit'],
... columns=['one', 'two', 'three'])
>>> df
one two three
mouse 1 2 3
rabbit 4 5 6
>>> # select columns by name
>>> df.filter(items=['one', 'three'])
one three
mouse 1 3
rabbit 4 6
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
one three
mouse 1 3
rabbit 4 6
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
one two three
rabbit 4 5 6
- first(self: 'NDFrameT', offset) -> 'NDFrameT'
- Select initial periods of time series data based on a date offset.
When having a DataFrame with dates as index, this function can
select the first few rows based on a date offset.
Parameters
----------
offset : str, DateOffset or dateutil.relativedelta
The offset length of the data that will be selected. For instance,
'1M' will display all the rows having their index within the first month.
Returns
-------
Series or DataFrame
A subset of the caller.
Raises
------
TypeError
If the index is not a :class:`DatetimeIndex`
See Also
--------
last : Select final periods of time series based on a date offset.
at_time : Select values at a particular time of the day.
between_time : Select values between particular times of the day.
Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 1
2018-04-11 2
2018-04-13 3
2018-04-15 4
Get the rows for the first 3 days:
>>> ts.first('3D')
A
2018-04-09 1
2018-04-11 2
Notice the data for 3 first calendar days were returned, not the first
3 days observed in the dataset, and therefore data for 2018-04-13 was
not returned.
- first_valid_index(self) -> 'Hashable | None'
- Return index for first non-NA value or None, if no non-NA value is found.
Returns
-------
scalar : type of index
Notes
-----
If all elements are non-NA/null, returns None.
Also returns None for empty Series/DataFrame.
- get(self, key, default=None)
- Get item from object for given key (ex: DataFrame column).
Returns default value if not found.
Parameters
----------
key : object
Returns
-------
value : same type as items contained in object
Examples
--------
>>> df = pd.DataFrame(
... [
... [24.3, 75.7, "high"],
... [31, 87.8, "high"],
... [22, 71.6, "medium"],
... [35, 95, "medium"],
... ],
... columns=["temp_celsius", "temp_fahrenheit", "windspeed"],
... index=pd.date_range(start="2014-02-12", end="2014-02-15", freq="D"),
... )
>>> df
temp_celsius temp_fahrenheit windspeed
2014-02-12 24.3 75.7 high
2014-02-13 31.0 87.8 high
2014-02-14 22.0 71.6 medium
2014-02-15 35.0 95.0 medium
>>> df.get(["temp_celsius", "windspeed"])
temp_celsius windspeed
2014-02-12 24.3 high
2014-02-13 31.0 high
2014-02-14 22.0 medium
2014-02-15 35.0 medium
If the key isn't found, the default value will be used.
>>> df.get(["temp_celsius", "temp_kelvin"], default="default_value")
'default_value'
- head(self: 'NDFrameT', n: 'int' = 5) -> 'NDFrameT'
- Return the first `n` rows.
This function returns the first `n` rows for the object based
on position. It is useful for quickly testing if your object
has the right type of data in it.
For negative values of `n`, this function returns all rows except
the last `n` rows, equivalent to ``df[:-n]``.
Parameters
----------
n : int, default 5
Number of rows to select.
Returns
-------
same type as caller
The first `n` rows of the caller object.
See Also
--------
DataFrame.tail: Returns the last `n` rows.
Examples
--------
>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
Viewing the first 5 lines
>>> df.head()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
Viewing the first `n` lines (three in this case)
>>> df.head(3)
animal
0 alligator
1 bee
2 falcon
For negative values of `n`
>>> df.head(-3)
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
- infer_objects(self: 'NDFrameT') -> 'NDFrameT'
- Attempt to infer better dtypes for object columns.
Attempts soft conversion of object-dtyped
columns, leaving non-object and unconvertible
columns unchanged. The inference rules are the
same as during normal Series/DataFrame construction.
Returns
-------
converted : same type as input object
See Also
--------
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
to_numeric : Convert argument to numeric type.
convert_dtypes : Convert argument to best possible dtype.
Examples
--------
>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
A
1 1
2 2
3 3
>>> df.dtypes
A object
dtype: object
>>> df.infer_objects().dtypes
A int64
dtype: object
- keys(self)
- Get the 'info axis' (see Indexing for more).
This is index for Series, columns for DataFrame.
Returns
-------
Index
Info axis.
- last(self: 'NDFrameT', offset) -> 'NDFrameT'
- Select final periods of time series data based on a date offset.
For a DataFrame with a sorted DatetimeIndex, this function
selects the last few rows based on a date offset.
Parameters
----------
offset : str, DateOffset, dateutil.relativedelta
The offset length of the data that will be selected. For instance,
'3D' will display all the rows having their index within the last 3 days.
Returns
-------
Series or DataFrame
A subset of the caller.
Raises
------
TypeError
If the index is not a :class:`DatetimeIndex`
See Also
--------
first : Select initial periods of time series based on a date offset.
at_time : Select values at a particular time of the day.
between_time : Select values between particular times of the day.
Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 1
2018-04-11 2
2018-04-13 3
2018-04-15 4
Get the rows for the last 3 days:
>>> ts.last('3D')
A
2018-04-13 3
2018-04-15 4
Notice the data for 3 last calendar days were returned, not the last
3 observed days in the dataset, and therefore data for 2018-04-11 was
not returned.
- last_valid_index(self) -> 'Hashable | None'
- Return index for last non-NA value or None, if no non-NA value is found.
Returns
-------
scalar : type of index
Notes
-----
If all elements are non-NA/null, returns None.
Also returns None for empty Series/DataFrame.
- pad = ffill(self: 'NDFrameT', axis: 'None | Axis' = None, inplace: 'bool_t' = False, limit: 'None | int' = None, downcast=None) -> 'NDFrameT | None'
- Synonym for :meth:`DataFrame.fillna` with ``method='ffill'``.
Returns
-------
Series/DataFrame or None
Object with missing values filled or None if ``inplace=True``.
- pct_change(self: 'NDFrameT', periods=1, fill_method='pad', limit=None, freq=None, **kwargs) -> 'NDFrameT'
- Percentage change between the current and a prior element.
Computes the percentage change from the immediately previous row by
default. This is useful in comparing the percentage of change in a time
series of elements.
Parameters
----------
periods : int, default 1
Periods to shift for forming percent change.
fill_method : str, default 'pad'
How to handle NAs before computing percent changes.
limit : int, default None
The number of consecutive NAs to fill before stopping.
freq : DateOffset, timedelta, or str, optional
Increment to use from time series API (e.g. 'M' or BDay()).
**kwargs
Additional keyword arguments are passed into
`DataFrame.shift` or `Series.shift`.
Returns
-------
chg : Series or DataFrame
The same type as the calling object.
See Also
--------
Series.diff : Compute the difference of two elements in a Series.
DataFrame.diff : Compute the difference of two elements in a DataFrame.
Series.shift : Shift the index by some number of periods.
DataFrame.shift : Shift the index by some number of periods.
Examples
--------
**Series**
>>> s = pd.Series([90, 91, 85])
>>> s
0 90
1 91
2 85
dtype: int64
>>> s.pct_change()
0 NaN
1 0.011111
2 -0.065934
dtype: float64
>>> s.pct_change(periods=2)
0 NaN
1 NaN
2 -0.055556
dtype: float64
See the percentage change in a Series where filling NAs with last
valid observation forward to next valid.
>>> s = pd.Series([90, 91, None, 85])
>>> s
0 90.0
1 91.0
2 NaN
3 85.0
dtype: float64
>>> s.pct_change(fill_method='ffill')
0 NaN
1 0.011111
2 0.000000
3 -0.065934
dtype: float64
**DataFrame**
Percentage change in French franc, Deutsche Mark, and Italian lira from
1980-01-01 to 1980-03-01.
>>> df = pd.DataFrame({
... 'FR': [4.0405, 4.0963, 4.3149],
... 'GR': [1.7246, 1.7482, 1.8519],
... 'IT': [804.74, 810.01, 860.13]},
... index=['1980-01-01', '1980-02-01', '1980-03-01'])
>>> df
FR GR IT
1980-01-01 4.0405 1.7246 804.74
1980-02-01 4.0963 1.7482 810.01
1980-03-01 4.3149 1.8519 860.13
>>> df.pct_change()
FR GR IT
1980-01-01 NaN NaN NaN
1980-02-01 0.013810 0.013684 0.006549
1980-03-01 0.053365 0.059318 0.061876
Percentage of change in GOOG and APPL stock volume. Shows computing
the percentage change between columns.
>>> df = pd.DataFrame({
... '2016': [1769950, 30586265],
... '2015': [1500923, 40912316],
... '2014': [1371819, 41403351]},
... index=['GOOG', 'APPL'])
>>> df
2016 2015 2014
GOOG 1769950 1500923 1371819
APPL 30586265 40912316 41403351
>>> df.pct_change(axis='columns', periods=-1)
2016 2015 2014
GOOG 0.179241 0.094112 NaN
APPL -0.252395 -0.011860 NaN
- pipe(self, func: 'Callable[..., T] | tuple[Callable[..., T], str]', *args, **kwargs) -> 'T'
- Apply chainable functions that expect Series or DataFrames.
Parameters
----------
func : function
Function to apply to the Series/DataFrame.
``args``, and ``kwargs`` are passed into ``func``.
Alternatively a ``(callable, data_keyword)`` tuple where
``data_keyword`` is a string indicating the keyword of
``callable`` that expects the Series/DataFrame.
args : iterable, optional
Positional arguments passed into ``func``.
kwargs : mapping, optional
A dictionary of keyword arguments passed into ``func``.
Returns
-------
object : the return type of ``func``.
See Also
--------
DataFrame.apply : Apply a function along input axis of DataFrame.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
Series.map : Apply a mapping correspondence on a
:class:`~pandas.Series`.
Notes
-----
Use ``.pipe`` when chaining together functions that expect
Series, DataFrames or GroupBy objects. Instead of writing
>>> func(g(h(df), arg1=a), arg2=b, arg3=c) # doctest: +SKIP
You can write
>>> (df.pipe(h)
... .pipe(g, arg1=a)
... .pipe(func, arg2=b, arg3=c)
... ) # doctest: +SKIP
If you have a function that takes the data as (say) the second
argument, pass a tuple indicating which keyword expects the
data. For example, suppose ``f`` takes its data as ``arg2``:
>>> (df.pipe(h)
... .pipe(g, arg1=a)
... .pipe((func, 'arg2'), arg1=a, arg3=c)
... ) # doctest: +SKIP
- rank(self: 'NDFrameT', axis=0, method: 'str' = 'average', numeric_only: 'bool_t | None | lib.NoDefault' = <no_default>, na_option: 'str' = 'keep', ascending: 'bool_t' = True, pct: 'bool_t' = False) -> 'NDFrameT'
- Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the
ranks of those values.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
Index to direct ranking.
method : {'average', 'min', 'max', 'first', 'dense'}, default 'average'
How to rank the group of records that have the same value (i.e. ties):
* average: average rank of the group
* min: lowest rank in the group
* max: highest rank in the group
* first: ranks assigned in order they appear in the array
* dense: like 'min', but rank always increases by 1 between groups.
numeric_only : bool, optional
For DataFrame objects, rank only numeric columns if set to True.
na_option : {'keep', 'top', 'bottom'}, default 'keep'
How to rank NaN values:
* keep: assign NaN rank to NaN values
* top: assign lowest rank to NaN values
* bottom: assign highest rank to NaN values
ascending : bool, default True
Whether or not the elements should be ranked in ascending order.
pct : bool, default False
Whether or not to display the returned rankings in percentile
form.
Returns
-------
same type as caller
Return a Series or DataFrame with data ranks as values.
See Also
--------
core.groupby.GroupBy.rank : Rank of values within each group.
Examples
--------
>>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
... 'spider', 'snake'],
... 'Number_legs': [4, 2, 4, 8, np.nan]})
>>> df
Animal Number_legs
0 cat 4.0
1 penguin 2.0
2 dog 4.0
3 spider 8.0
4 snake NaN
The following example shows how the method behaves with the above
parameters:
* default_rank: this is the default behaviour obtained without using
any parameter.
* max_rank: setting ``method = 'max'`` the records that have the
same values are ranked using the highest rank (e.g.: since 'cat'
and 'dog' are both in the 2nd and 3rd position, rank 3 is assigned.)
* NA_bottom: choosing ``na_option = 'bottom'``, if there are records
with NaN values they are placed at the bottom of the ranking.
* pct_rank: when setting ``pct = True``, the ranking is expressed as
percentile rank.
>>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df
Animal Number_legs default_rank max_rank NA_bottom pct_rank
0 cat 4.0 2.5 3.0 2.5 0.625
1 penguin 2.0 1.0 1.0 1.0 0.250
2 dog 4.0 2.5 3.0 2.5 0.625
3 spider 8.0 4.0 4.0 4.0 1.000
4 snake NaN NaN NaN 5.0 NaN
- reindex_like(self: 'NDFrameT', other, method: 'str | None' = None, copy: 'bool_t' = True, limit=None, tolerance=None) -> 'NDFrameT'
- Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional
filling logic, placing NaN in locations having no value
in the previous index. A new object is produced unless the
new index is equivalent to the current one and copy=False.
Parameters
----------
other : Object of the same data type
Its row and column indices are used to define the new indices
of this object.
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* None (default): don't fill gaps
* pad / ffill: propagate last valid observation forward to next
valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap.
copy : bool, default True
Return a new object, even if the passed indexes are the same.
limit : int, default None
Maximum number of consecutive labels to fill for inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
Series or DataFrame
Same type as caller, but with changed indices on each axis.
See Also
--------
DataFrame.set_index : Set row labels.
DataFrame.reset_index : Remove row labels or move them to new columns.
DataFrame.reindex : Change to new indices or expand indices.
Notes
-----
Same as calling
``.reindex(index=other.index, columns=other.columns,...)``.
Examples
--------
>>> df1 = pd.DataFrame([[24.3, 75.7, 'high'],
... [31, 87.8, 'high'],
... [22, 71.6, 'medium'],
... [35, 95, 'medium']],
... columns=['temp_celsius', 'temp_fahrenheit',
... 'windspeed'],
... index=pd.date_range(start='2014-02-12',
... end='2014-02-15', freq='D'))
>>> df1
temp_celsius temp_fahrenheit windspeed
2014-02-12 24.3 75.7 high
2014-02-13 31.0 87.8 high
2014-02-14 22.0 71.6 medium
2014-02-15 35.0 95.0 medium
>>> df2 = pd.DataFrame([[28, 'low'],
... [30, 'low'],
... [35.1, 'medium']],
... columns=['temp_celsius', 'windspeed'],
... index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
... '2014-02-15']))
>>> df2
temp_celsius windspeed
2014-02-12 28.0 low
2014-02-13 30.0 low
2014-02-15 35.1 medium
>>> df2.reindex_like(df1)
temp_celsius temp_fahrenheit windspeed
2014-02-12 28.0 NaN low
2014-02-13 30.0 NaN low
2014-02-14 NaN NaN NaN
2014-02-15 35.1 NaN medium
- rename_axis(self, mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)
- Set the name of the axis for the index or columns.
Parameters
----------
mapper : scalar, list-like, optional
Value to set the axis name attribute.
index, columns : scalar, list-like, dict-like or function, optional
A scalar, list-like, dict-like or functions transformations to
apply to that axis' values.
Note that the ``columns`` parameter is not allowed if the
object is a Series. This parameter only apply for DataFrame
type objects.
Use either ``mapper`` and ``axis`` to
specify the axis to target with ``mapper``, or ``index``
and/or ``columns``.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to rename.
copy : bool, default True
Also copy underlying data.
inplace : bool, default False
Modifies the object directly, instead of creating a new Series
or DataFrame.
Returns
-------
Series, DataFrame, or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Series.rename : Alter Series index labels or name.
DataFrame.rename : Alter DataFrame index labels or name.
Index.rename : Set new names on index.
Notes
-----
``DataFrame.rename_axis`` supports two calling conventions
* ``(index=index_mapper, columns=columns_mapper, ...)``
* ``(mapper, axis={'index', 'columns'}, ...)``
The first calling convention will only modify the names of
the index and/or the names of the Index object that is the columns.
In this case, the parameter ``copy`` is ignored.
The second calling convention will modify the names of the
corresponding index if mapper is a list or a scalar.
However, if mapper is dict-like or a function, it will use the
deprecated behavior of modifying the axis *labels*.
We *highly* recommend using keyword arguments to clarify your
intent.
Examples
--------
**Series**
>>> s = pd.Series(["dog", "cat", "monkey"])
>>> s
0 dog
1 cat
2 monkey
dtype: object
>>> s.rename_axis("animal")
animal
0 dog
1 cat
2 monkey
dtype: object
**DataFrame**
>>> df = pd.DataFrame({"num_legs": [4, 4, 2],
... "num_arms": [0, 0, 2]},
... ["dog", "cat", "monkey"])
>>> df
num_legs num_arms
dog 4 0
cat 4 0
monkey 2 2
>>> df = df.rename_axis("animal")
>>> df
num_legs num_arms
animal
dog 4 0
cat 4 0
monkey 2 2
>>> df = df.rename_axis("limbs", axis="columns")
>>> df
limbs num_legs num_arms
animal
dog 4 0
cat 4 0
monkey 2 2
**MultiIndex**
>>> df.index = pd.MultiIndex.from_product([['mammal'],
... ['dog', 'cat', 'monkey']],
... names=['type', 'name'])
>>> df
limbs num_legs num_arms
type name
mammal dog 4 0
cat 4 0
monkey 2 2
>>> df.rename_axis(index={'type': 'class'})
limbs num_legs num_arms
class name
mammal dog 4 0
cat 4 0
monkey 2 2
>>> df.rename_axis(columns=str.upper)
LIMBS num_legs num_arms
type name
mammal dog 4 0
cat 4 0
monkey 2 2
- rolling(self, window: 'int | timedelta | BaseOffset | BaseIndexer', min_periods: 'int | None' = None, center: 'bool_t' = False, win_type: 'str | None' = None, on: 'str | None' = None, axis: 'Axis' = 0, closed: 'str | None' = None, method: 'str' = 'single')
- Provide rolling window calculations.
Parameters
----------
window : int, offset, or BaseIndexer subclass
Size of the moving window.
If an integer, the fixed number of observations used for
each window.
If an offset, the time period of each window. Each
window will be a variable sized based on the observations included in
the time-period. This is only valid for datetimelike indexes.
To learn more about the offsets & frequency strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
If a BaseIndexer subclass, the window boundaries
based on the defined ``get_window_bounds`` method. Additional rolling
keyword arguments, namely ``min_periods``, ``center``, and
``closed`` will be passed to ``get_window_bounds``.
min_periods : int, default None
Minimum number of observations in window required to have a value;
otherwise, result is ``np.nan``.
For a window that is specified by an offset, ``min_periods`` will default to 1.
For a window that is specified by an integer, ``min_periods`` will default
to the size of the window.
center : bool, default False
If False, set the window labels as the right edge of the window index.
If True, set the window labels as the center of the window index.
win_type : str, default None
If ``None``, all points are evenly weighted.
If a string, it must be a valid `scipy.signal window function
<https://docs.scipy.org/doc/scipy/reference/signal.windows.html#module-scipy.signal.windows>`__.
Certain Scipy window types require additional parameters to be passed
in the aggregation function. The additional parameters must match
the keywords specified in the Scipy window type method signature.
on : str, optional
For a DataFrame, a column label or Index level on which
to calculate the rolling window, rather than the DataFrame's index.
Provided integer column is ignored and excluded from result since
an integer index is not used to calculate the rolling window.
axis : int or str, default 0
If ``0`` or ``'index'``, roll across the rows.
If ``1`` or ``'columns'``, roll across the columns.
closed : str, default None
If ``'right'``, the first point in the window is excluded from calculations.
If ``'left'``, the last point in the window is excluded from calculations.
If ``'both'``, the no points in the window are excluded from calculations.
If ``'neither'``, the first and last points in the window are excluded
from calculations.
Default ``None`` (``'right'``).
.. versionchanged:: 1.2.0
The closed parameter with fixed windows is now supported.
method : str {'single', 'table'}, default 'single'
.. versionadded:: 1.3.0
Execute the rolling operation per single column or row (``'single'``)
or over the entire object (``'table'``).
This argument is only implemented when specifying ``engine='numba'``
in the method call.
Returns
-------
``Window`` subclass if a ``win_type`` is passed
``Rolling`` subclass if ``win_type`` is not passed
See Also
--------
expanding : Provides expanding transformations.
ewm : Provides exponential weighted functions.
Notes
-----
See :ref:`Windowing Operations <window.generic>` for further usage details
and examples.
Examples
--------
>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
**window**
Rolling sum with a window length of 2 observations.
>>> df.rolling(2).sum()
B
0 NaN
1 1.0
2 3.0
3 NaN
4 NaN
Rolling sum with a window span of 2 seconds.
>>> df_time = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
... index = [pd.Timestamp('20130101 09:00:00'),
... pd.Timestamp('20130101 09:00:02'),
... pd.Timestamp('20130101 09:00:03'),
... pd.Timestamp('20130101 09:00:05'),
... pd.Timestamp('20130101 09:00:06')])
>>> df_time
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:03 2.0
2013-01-01 09:00:05 NaN
2013-01-01 09:00:06 4.0
>>> df_time.rolling('2s').sum()
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:03 3.0
2013-01-01 09:00:05 NaN
2013-01-01 09:00:06 4.0
Rolling sum with forward looking windows with 2 observations.
>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
>>> df.rolling(window=indexer, min_periods=1).sum()
B
0 1.0
1 3.0
2 2.0
3 4.0
4 4.0
**min_periods**
Rolling sum with a window length of 2 observations, but only needs a minimum of 1
observation to calculate a value.
>>> df.rolling(2, min_periods=1).sum()
B
0 0.0
1 1.0
2 3.0
3 2.0
4 4.0
**center**
Rolling sum with the result assigned to the center of the window index.
>>> df.rolling(3, min_periods=1, center=True).sum()
B
0 1.0
1 3.0
2 3.0
3 6.0
4 4.0
>>> df.rolling(3, min_periods=1, center=False).sum()
B
0 0.0
1 1.0
2 3.0
3 3.0
4 6.0
**win_type**
Rolling sum with a window length of 2, using the Scipy ``'gaussian'``
window type. ``std`` is required in the aggregation function.
>>> df.rolling(2, win_type='gaussian').sum(std=3)
B
0 NaN
1 0.986207
2 2.958621
3 NaN
4 NaN
- sample(self: 'NDFrameT', n: 'int | None' = None, frac: 'float | None' = None, replace: 'bool_t' = False, weights=None, random_state: 'RandomState | None' = None, axis: 'Axis | None' = None, ignore_index: 'bool_t' = False) -> 'NDFrameT'
- Return a random sample of items from an axis of object.
You can use `random_state` for reproducibility.
Parameters
----------
n : int, optional
Number of items from axis to return. Cannot be used with `frac`.
Default = 1 if `frac` = None.
frac : float, optional
Fraction of axis items to return. Cannot be used with `n`.
replace : bool, default False
Allow or disallow sampling of the same row more than once.
weights : str or ndarray-like, optional
Default 'None' results in equal probability weighting.
If passed a Series, will align with target object on index. Index
values in weights not found in sampled object will be ignored and
index values in sampled object not in weights will be assigned
weights of zero.
If called on a DataFrame, will accept the name of a column
when axis = 0.
Unless weights are a Series, weights must be same length as axis
being sampled.
If weights do not sum to 1, they will be normalized to sum to 1.
Missing values in the weights column will be treated as zero.
Infinite values not allowed.
random_state : int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
If int, array-like, or BitGenerator, seed for random number generator.
If np.random.RandomState or np.random.Generator, use as given.
.. versionchanged:: 1.1.0
array-like and BitGenerator object now passed to np.random.RandomState()
as seed
.. versionchanged:: 1.4.0
np.random.Generator objects now accepted
axis : {0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis
for given data type (0 for Series and DataFrames).
ignore_index : bool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.3.0
Returns
-------
Series or DataFrame
A new object of same type as caller containing `n` items randomly
sampled from the caller object.
See Also
--------
DataFrameGroupBy.sample: Generates random samples from each group of a
DataFrame object.
SeriesGroupBy.sample: Generates random samples from each group of a
Series object.
numpy.random.choice: Generates a random sample from a given 1-D numpy
array.
Notes
-----
If `frac` > 1, `replacement` should be set to `True`.
Examples
--------
>>> df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
... 'num_wings': [2, 0, 0, 0],
... 'num_specimen_seen': [10, 2, 1, 8]},
... index=['falcon', 'dog', 'spider', 'fish'])
>>> df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
Extract 3 random elements from the ``Series`` ``df['num_legs']``:
Note that we use `random_state` to ensure the reproducibility of
the examples.
>>> df['num_legs'].sample(n=3, random_state=1)
fish 0
spider 8
falcon 2
Name: num_legs, dtype: int64
A random 50% sample of the ``DataFrame`` with replacement:
>>> df.sample(frac=0.5, replace=True, random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
An upsample sample of the ``DataFrame`` with replacement:
Note that `replace` parameter has to be `True` for `frac` parameter > 1.
>>> df.sample(frac=2, replace=True, random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
falcon 2 2 10
falcon 2 2 10
fish 0 0 8
dog 4 0 2
fish 0 0 8
dog 4 0 2
Using a DataFrame column as weights. Rows with larger value in the
`num_specimen_seen` column are more likely to be sampled.
>>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
num_legs num_wings num_specimen_seen
falcon 2 2 10
fish 0 0 8
- set_flags(self: 'NDFrameT', *, copy: 'bool_t' = False, allows_duplicate_labels: 'bool_t | None' = None) -> 'NDFrameT'
- Return a new object with updated flags.
Parameters
----------
allows_duplicate_labels : bool, optional
Whether the returned object allows duplicate labels.
Returns
-------
Series or DataFrame
The same type as the caller.
See Also
--------
DataFrame.attrs : Global metadata applying to this dataset.
DataFrame.flags : Global flags applying to this object.
Notes
-----
This method returns a new object that's a view on the same data
as the input. Mutating the input or the output values will be reflected
in the other.
This method is intended to be used in method chains.
"Flags" differ from "metadata". Flags reflect properties of the
pandas object (the Series or DataFrame). Metadata refer to properties
of the dataset, and should be stored in :attr:`DataFrame.attrs`.
Examples
--------
>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False
- slice_shift(self: 'NDFrameT', periods: 'int' = 1, axis=0) -> 'NDFrameT'
- Equivalent to `shift` without copying data.
The shifted data will not include the dropped periods and the
shifted axis will be smaller than the original.
.. deprecated:: 1.2.0
slice_shift is deprecated,
use DataFrame/Series.shift instead.
Parameters
----------
periods : int
Number of periods to move, can be positive or negative.
Returns
-------
shifted : same type as caller
Notes
-----
While the `slice_shift` is faster than `shift`, you may pay for it
later during alignment.
- squeeze(self, axis=None)
- Squeeze 1 dimensional axis objects into scalars.
Series or DataFrames with a single element are squeezed to a scalar.
DataFrames with a single column or a single row are squeezed to a
Series. Otherwise the object is unchanged.
This method is most useful when you don't know if your
object is a Series or DataFrame, but you do know it has just a single
column. In that case you can safely call `squeeze` to ensure you have a
Series.
Parameters
----------
axis : {0 or 'index', 1 or 'columns', None}, default None
A specific axis to squeeze. By default, all length-1 axes are
squeezed.
Returns
-------
DataFrame, Series, or scalar
The projection after squeezing `axis` or all the axes.
See Also
--------
Series.iloc : Integer-location based indexing for selecting scalars.
DataFrame.iloc : Integer-location based indexing for selecting Series.
Series.to_frame : Inverse of DataFrame.squeeze for a
single-column DataFrame.
Examples
--------
>>> primes = pd.Series([2, 3, 5, 7])
Slicing might produce a Series with a single value:
>>> even_primes = primes[primes % 2 == 0]
>>> even_primes
0 2
dtype: int64
>>> even_primes.squeeze()
2
Squeezing objects with more than one value in every axis does nothing:
>>> odd_primes = primes[primes % 2 == 1]
>>> odd_primes
1 3
2 5
3 7
dtype: int64
>>> odd_primes.squeeze()
1 3
2 5
3 7
dtype: int64
Squeezing is even more effective when used with DataFrames.
>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
>>> df
a b
0 1 2
1 3 4
Slicing a single column will produce a DataFrame with the columns
having only one value:
>>> df_a = df[['a']]
>>> df_a
a
0 1
1 3
So the columns can be squeezed down, resulting in a Series:
>>> df_a.squeeze('columns')
0 1
1 3
Name: a, dtype: int64
Slicing a single row from a single column will produce a single
scalar DataFrame:
>>> df_0a = df.loc[df.index < 1, ['a']]
>>> df_0a
a
0 1
Squeezing the rows produces a single scalar Series:
>>> df_0a.squeeze('rows')
a 1
Name: 0, dtype: int64
Squeezing all axes will project directly into a scalar:
>>> df_0a.squeeze()
1
- swapaxes(self: 'NDFrameT', axis1, axis2, copy=True) -> 'NDFrameT'
- Interchange axes and swap values axes appropriately.
Returns
-------
y : same as input
- tail(self: 'NDFrameT', n: 'int' = 5) -> 'NDFrameT'
- Return the last `n` rows.
This function returns last `n` rows from the object based on
position. It is useful for quickly verifying data, for example,
after sorting or appending rows.
For negative values of `n`, this function returns all rows except
the first `n` rows, equivalent to ``df[n:]``.
Parameters
----------
n : int, default 5
Number of rows to select.
Returns
-------
type of caller
The last `n` rows of the caller object.
See Also
--------
DataFrame.head : The first `n` rows of the caller object.
Examples
--------
>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
Viewing the last 5 lines
>>> df.tail()
animal
4 monkey
5 parrot
6 shark
7 whale
8 zebra
Viewing the last `n` lines (three in this case)
>>> df.tail(3)
animal
6 shark
7 whale
8 zebra
For negative values of `n`
>>> df.tail(-3)
animal
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
- take(self: 'NDFrameT', indices, axis=0, is_copy: 'bool_t | None' = None, **kwargs) -> 'NDFrameT'
- Return the elements in the given *positional* indices along an axis.
This means that we are not indexing according to actual values in
the index attribute of the object. We are indexing according to the
actual position of the element in the object.
Parameters
----------
indices : array-like
An array of ints indicating which positions to take.
axis : {0 or 'index', 1 or 'columns', None}, default 0
The axis on which to select elements. ``0`` means that we are
selecting rows, ``1`` means that we are selecting columns.
is_copy : bool
Before pandas 1.0, ``is_copy=False`` can be specified to ensure
that the return value is an actual copy. Starting with pandas 1.0,
``take`` always returns a copy, and the keyword is therefore
deprecated.
.. deprecated:: 1.0.0
**kwargs
For compatibility with :meth:`numpy.take`. Has no effect on the
output.
Returns
-------
taken : same type as caller
An array-like containing the elements taken from the object.
See Also
--------
DataFrame.loc : Select a subset of a DataFrame by labels.
DataFrame.iloc : Select a subset of a DataFrame by positions.
numpy.take : Take elements from an array along an axis.
Examples
--------
>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', np.nan)],
... columns=['name', 'class', 'max_speed'],
... index=[0, 2, 3, 1])
>>> df
name class max_speed
0 falcon bird 389.0
2 parrot bird 24.0
3 lion mammal 80.5
1 monkey mammal NaN
Take elements at positions 0 and 3 along the axis 0 (default).
Note how the actual indices selected (0 and 1) do not correspond to
our selected indices 0 and 3. That's because we are selecting the 0th
and 3rd rows, not rows whose indices equal 0 and 3.
>>> df.take([0, 3])
name class max_speed
0 falcon bird 389.0
1 monkey mammal NaN
Take elements at indices 1 and 2 along the axis 1 (column selection).
>>> df.take([1, 2], axis=1)
class max_speed
0 bird 389.0
2 bird 24.0
3 mammal 80.5
1 mammal NaN
We may take elements using negative integers for positive indices,
starting from the end of the object, just like with Python lists.
>>> df.take([-1, -2])
name class max_speed
1 monkey mammal NaN
3 lion mammal 80.5
- to_clipboard(self, excel: 'bool_t' = True, sep: 'str | None' = None, **kwargs) -> 'None'
- Copy object to the system clipboard.
Write a text representation of object to the system clipboard.
This can be pasted into Excel, for example.
Parameters
----------
excel : bool, default True
Produce output in a csv format for easy pasting into excel.
- True, use the provided separator for csv pasting.
- False, write a string representation of the object to the clipboard.
sep : str, default ``'\t'``
Field delimiter.
**kwargs
These parameters will be passed to DataFrame.to_csv.
See Also
--------
DataFrame.to_csv : Write a DataFrame to a comma-separated values
(csv) file.
read_clipboard : Read text from clipboard and pass to read_csv.
Notes
-----
Requirements for your platform.
- Linux : `xclip`, or `xsel` (with `PyQt4` modules)
- Windows : none
- macOS : none
Examples
--------
Copy the contents of a DataFrame to the clipboard.
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
>>> df.to_clipboard(sep=',') # doctest: +SKIP
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6
We can omit the index by passing the keyword `index` and setting
it to false.
>>> df.to_clipboard(sep=',', index=False) # doctest: +SKIP
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6
- to_csv(self, path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, sep: 'str' = ',', na_rep: 'str' = '', float_format: 'str | None' = None, columns: 'Sequence[Hashable] | None' = None, header: 'bool_t | list[str]' = True, index: 'bool_t' = True, index_label: 'IndexLabel | None' = None, mode: 'str' = 'w', encoding: 'str | None' = None, compression: 'CompressionOptions' = 'infer', quoting: 'int | None' = None, quotechar: 'str' = '"', line_terminator: 'str | None' = None, chunksize: 'int | None' = None, date_format: 'str | None' = None, doublequote: 'bool_t' = True, escapechar: 'str | None' = None, decimal: 'str' = '.', errors: 'str' = 'strict', storage_options: 'StorageOptions' = None) -> 'str | None'
- Write object to a comma-separated values (csv) file.
Parameters
----------
path_or_buf : str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like
object implementing a write() function. If None, the result is
returned as a string. If a non-binary file object is passed, it should
be opened with `newline=''`, disabling universal newlines. If a binary
file object is passed, `mode` might need to contain a `'b'`.
.. versionchanged:: 1.2.0
Support for binary file objects was introduced.
sep : str, default ','
String of length 1. Field delimiter for the output file.
na_rep : str, default ''
Missing data representation.
float_format : str, default None
Format string for floating point numbers.
columns : sequence, optional
Columns to write.
header : bool or list of str, default True
Write out the column names. If a list of strings is given it is
assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
index_label : str or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and
`header` and `index` are True, then the index names are used. A
sequence should be given if the object uses MultiIndex. If
False do not print fields for index names. Use index_label=False
for easier importing in R.
mode : str
Python write mode, default 'w'.
encoding : str, optional
A string representing the encoding to use in the output file,
defaults to 'utf-8'. `encoding` is not supported if `path_or_buf`
is a non-binary file object.
compression : str or dict, default 'infer'
For on-the-fly compression of the output data. If 'infer' and '%s'
path-like, then detect compression from the following extensions: '.gz',
'.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). Set to
``None`` for no compression. Can also be a dict with key ``'method'`` set
to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``} and other
key-value pairs are forwarded to ``zipfile.ZipFile``, ``gzip.GzipFile``,
``bz2.BZ2File``, or ``zstandard.ZstdDecompressor``, respectively. As an
example, the following could be passed for faster compression and to create
a reproducible gzip archive:
``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.
.. versionchanged:: 1.0.0
May now be a dict with key 'method' as compression mode
and other entries as additional compression options if
compression mode is 'zip'.
.. versionchanged:: 1.1.0
Passing compression options as keys in dict is
supported for compression modes 'gzip', 'bz2', 'zstd', and 'zip'.
.. versionchanged:: 1.2.0
Compression is supported for binary file objects.
.. versionchanged:: 1.2.0
Previous versions forwarded dict entries for 'gzip' to
`gzip.open` instead of `gzip.GzipFile` which prevented
setting `mtime`.
quoting : optional constant from csv module
Defaults to csv.QUOTE_MINIMAL. If you have set a `float_format`
then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
will treat them as non-numeric.
quotechar : str, default '\"'
String of length 1. Character used to quote fields.
line_terminator : str, optional
The newline character or character sequence to use in the output
file. Defaults to `os.linesep`, which depends on the OS in which
this method is called ('\\n' for linux, '\\r\\n' for Windows, i.e.).
chunksize : int or None
Rows to write at a time.
date_format : str, default None
Format string for datetime objects.
doublequote : bool, default True
Control quoting of `quotechar` inside a field.
escapechar : str, default None
String of length 1. Character used to escape `sep` and `quotechar`
when appropriate.
decimal : str, default '.'
Character recognized as decimal separator. E.g. use ',' for
European data.
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
.. versionadded:: 1.1.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
Returns
-------
None or str
If path_or_buf is None, returns the resulting csv format as a
string. Otherwise returns None.
See Also
--------
read_csv : Load a CSV file into a DataFrame.
to_excel : Write DataFrame to an Excel file.
Examples
--------
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
... 'mask': ['red', 'purple'],
... 'weapon': ['sai', 'bo staff']})
>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
Create 'out.zip' containing 'out.csv'
>>> compression_opts = dict(method='zip',
... archive_name='out.csv') # doctest: +SKIP
>>> df.to_csv('out.zip', index=False,
... compression=compression_opts) # doctest: +SKIP
To write a csv file to a new folder or nested folder you will first
need to create it using either Pathlib or os:
>>> from pathlib import Path # doctest: +SKIP
>>> filepath = Path('folder/subfolder/out.csv') # doctest: +SKIP
>>> filepath.parent.mkdir(parents=True, exist_ok=True) # doctest: +SKIP
>>> df.to_csv(filepath) # doctest: +SKIP
>>> import os # doctest: +SKIP
>>> os.makedirs('folder/subfolder', exist_ok=True) # doctest: +SKIP
>>> df.to_csv('folder/subfolder/out.csv') # doctest: +SKIP
- to_excel(self, excel_writer, sheet_name: 'str' = 'Sheet1', na_rep: 'str' = '', float_format: 'str | None' = None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options: 'StorageOptions' = None) -> 'None'
- Write object to an Excel sheet.
To write a single object to an Excel .xlsx file it is only necessary to
specify a target file name. To write to multiple sheets it is necessary to
create an `ExcelWriter` object with a target file name, and specify a sheet
in the file to write to.
Multiple sheets may be written to by specifying unique `sheet_name`.
With all data written to the file it is necessary to save the changes.
Note that creating an `ExcelWriter` object with a file name that already
exists will result in the contents of the existing file being erased.
Parameters
----------
excel_writer : path-like, file-like, or ExcelWriter object
File path or existing ExcelWriter.
sheet_name : str, default 'Sheet1'
Name of sheet which will contain DataFrame.
na_rep : str, default ''
Missing data representation.
float_format : str, optional
Format string for floating point numbers. For example
``float_format="%.2f"`` will format 0.1234 to 0.12.
columns : sequence or list of str, optional
Columns to write.
header : bool or list of str, default True
Write out the column names. If a list of string is given it is
assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
index_label : str or sequence, optional
Column label for index column(s) if desired. If not specified, and
`header` and `index` are True, then the index names are used. A
sequence should be given if the DataFrame uses MultiIndex.
startrow : int, default 0
Upper left cell row to dump data frame.
startcol : int, default 0
Upper left cell column to dump data frame.
engine : str, optional
Write engine to use, 'openpyxl' or 'xlsxwriter'. You can also set this
via the options ``io.excel.xlsx.writer``, ``io.excel.xls.writer``, and
``io.excel.xlsm.writer``.
.. deprecated:: 1.2.0
As the `xlwt <https://pypi.org/project/xlwt/>`__ package is no longer
maintained, the ``xlwt`` engine will be removed in a future version
of pandas.
merge_cells : bool, default True
Write MultiIndex and Hierarchical Rows as merged cells.
encoding : str, optional
Encoding of the resulting excel file. Only necessary for xlwt,
other writers support unicode natively.
inf_rep : str, default 'inf'
Representation for infinity (there is no native representation for
infinity in Excel).
verbose : bool, default True
Display more information in the error logs.
freeze_panes : tuple of int (length 2), optional
Specifies the one-based bottommost row and rightmost column that
is to be frozen.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
See Also
--------
to_csv : Write DataFrame to a comma-separated values (csv) file.
ExcelWriter : Class for writing DataFrame objects into excel sheets.
read_excel : Read an Excel file into a pandas DataFrame.
read_csv : Read a comma-separated values (csv) file into DataFrame.
Notes
-----
For compatibility with :meth:`~DataFrame.to_csv`,
to_excel serializes lists and dicts to strings before writing.
Once a workbook has been saved it is not possible to write further
data without rewriting the whole workbook.
Examples
--------
Create, write to and save a workbook:
>>> df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
... index=['row 1', 'row 2'],
... columns=['col 1', 'col 2'])
>>> df1.to_excel("output.xlsx") # doctest: +SKIP
To specify the sheet name:
>>> df1.to_excel("output.xlsx",
... sheet_name='Sheet_name_1') # doctest: +SKIP
If you wish to write to more than one sheet in the workbook, it is
necessary to specify an ExcelWriter object:
>>> df2 = df1.copy()
>>> with pd.ExcelWriter('output.xlsx') as writer: # doctest: +SKIP
... df1.to_excel(writer, sheet_name='Sheet_name_1')
... df2.to_excel(writer, sheet_name='Sheet_name_2')
ExcelWriter can also be used to append to an existing Excel file:
>>> with pd.ExcelWriter('output.xlsx',
... mode='a') as writer: # doctest: +SKIP
... df.to_excel(writer, sheet_name='Sheet_name_3')
To set the library that is used to write the Excel file,
you can pass the `engine` keyword (the default engine is
automatically chosen depending on the file extension):
>>> df1.to_excel('output1.xlsx', engine='xlsxwriter') # doctest: +SKIP
- to_hdf(self, path_or_buf, key: 'str', mode: 'str' = 'a', complevel: 'int | None' = None, complib: 'str | None' = None, append: 'bool_t' = False, format: 'str | None' = None, index: 'bool_t' = True, min_itemsize: 'int | dict[str, int] | None' = None, nan_rep=None, dropna: 'bool_t | None' = None, data_columns: 'Literal[True] | list[str] | None' = None, errors: 'str' = 'strict', encoding: 'str' = 'UTF-8') -> 'None'
- Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an
application to interpret the structure and contents of a file with
no outside information. One HDF file can hold a mix of related objects
which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file
please use append mode and a different a key.
.. warning::
One can store a subclass of ``DataFrame`` or ``Series`` to HDF5,
but the type of the subclass is lost upon storing.
For more information see the :ref:`user guide <io.hdf5>`.
Parameters
----------
path_or_buf : str or pandas.HDFStore
File path or HDFStore object.
key : str
Identifier for the group in the store.
mode : {'a', 'w', 'r+'}, default 'a'
Mode to open file:
- 'w': write, a new file is created (an existing file with
the same name would be deleted).
- 'a': append, an existing file is opened for reading and
writing, and if the file does not exist it is created.
- 'r+': similar to 'a', but the file must already exist.
complevel : {0-9}, default None
Specifies a compression level for data.
A value of 0 or None disables compression.
complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
Specifies the compression library to be used.
As of v0.20.2 these additional compressors for Blosc are supported
(default if no compressor specified: 'blosc:blosclz'):
{'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
'blosc:zlib', 'blosc:zstd'}.
Specifying a compression library which is not available issues
a ValueError.
append : bool, default False
For Table formats, append the input data to the existing.
format : {'fixed', 'table', None}, default 'fixed'
Possible values:
- 'fixed': Fixed format. Fast writing/reading. Not-appendable,
nor searchable.
- 'table': Table format. Write as a PyTables Table structure
which may perform worse but allow more flexible operations
like searching / selecting subsets of the data.
- If None, pd.get_option('io.hdf.default_format') is checked,
followed by fallback to "fixed".
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
encoding : str, default "UTF-8"
min_itemsize : dict or int, optional
Map column names to minimum string sizes for columns.
nan_rep : Any, optional
How to represent null values as str.
Not allowed with append=True.
data_columns : list of columns or True, optional
List of columns to create as indexed data columns for on-disk
queries, or True to use all columns. By default only the axes
of the object are indexed. See :ref:`io.hdf5-query-data-columns`.
Applicable only to format='table'.
See Also
--------
read_hdf : Read from HDF file.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
DataFrame.to_sql : Write to a SQL table.
DataFrame.to_feather : Write out feather-format for DataFrames.
DataFrame.to_csv : Write out to a csv file.
Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
... index=['a', 'b', 'c']) # doctest: +SKIP
>>> df.to_hdf('data.h5', key='df', mode='w') # doctest: +SKIP
We can add another object to the same file:
>>> s = pd.Series([1, 2, 3, 4]) # doctest: +SKIP
>>> s.to_hdf('data.h5', key='s') # doctest: +SKIP
Reading from HDF file:
>>> pd.read_hdf('data.h5', 'df') # doctest: +SKIP
A B
a 1 4
b 2 5
c 3 6
>>> pd.read_hdf('data.h5', 's') # doctest: +SKIP
0 1
1 2
2 3
3 4
dtype: int64
- to_json(self, path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, orient: 'str | None' = None, date_format: 'str | None' = None, double_precision: 'int' = 10, force_ascii: 'bool_t' = True, date_unit: 'str' = 'ms', default_handler: 'Callable[[Any], JSONSerializable] | None' = None, lines: 'bool_t' = False, compression: 'CompressionOptions' = 'infer', index: 'bool_t' = True, indent: 'int | None' = None, storage_options: 'StorageOptions' = None) -> 'str | None'
- Convert the object to a JSON string.
Note NaN's and None will be converted to null and datetime objects
will be converted to UNIX timestamps.
Parameters
----------
path_or_buf : str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like
object implementing a write() function. If None, the result is
returned as a string.
orient : str
Indication of expected JSON string format.
* Series:
- default is 'index'
- allowed values are: {'split', 'records', 'index', 'table'}.
* DataFrame:
- default is 'columns'
- allowed values are: {'split', 'records', 'index', 'columns',
'values', 'table'}.
* The format of the JSON string:
- 'split' : dict like {'index' -> [index], 'columns' -> [columns],
'data' -> [values]}
- 'records' : list like [{column -> value}, ... , {column -> value}]
- 'index' : dict like {index -> {column -> value}}
- 'columns' : dict like {column -> {index -> value}}
- 'values' : just the values array
- 'table' : dict like {'schema': {schema}, 'data': {data}}
Describing the data, where data component is like ``orient='records'``.
date_format : {None, 'epoch', 'iso'}
Type of date conversion. 'epoch' = epoch milliseconds,
'iso' = ISO8601. The default depends on the `orient`. For
``orient='table'``, the default is 'iso'. For all other orients,
the default is 'epoch'.
double_precision : int, default 10
The number of decimal places to use when encoding
floating point values.
force_ascii : bool, default True
Force encoded string to be ASCII.
date_unit : str, default 'ms' (milliseconds)
The time unit to encode to, governs timestamp and ISO8601
precision. One of 's', 'ms', 'us', 'ns' for second, millisecond,
microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a
suitable format for JSON. Should receive a single argument which is
the object to convert and return a serialisable object.
lines : bool, default False
If 'orient' is 'records' write out line-delimited json format. Will
throw ValueError if incorrect 'orient' since others are not
list-like.
compression : str or dict, default 'infer'
For on-the-fly compression of the output data. If 'infer' and 'path_or_buf'
path-like, then detect compression from the following extensions: '.gz',
'.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). Set to
``None`` for no compression. Can also be a dict with key ``'method'`` set
to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``} and other
key-value pairs are forwarded to ``zipfile.ZipFile``, ``gzip.GzipFile``,
``bz2.BZ2File``, or ``zstandard.ZstdDecompressor``, respectively. As an
example, the following could be passed for faster compression and to create
a reproducible gzip archive:
``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.
.. versionchanged:: 1.4.0 Zstandard support.
index : bool, default True
Whether to include the index values in the JSON string. Not
including the index (``index=False``) is only supported when
orient is 'split' or 'table'.
indent : int, optional
Length of whitespace used to indent each record.
.. versionadded:: 1.0.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
Returns
-------
None or str
If path_or_buf is None, returns the resulting json format as a
string. Otherwise returns None.
See Also
--------
read_json : Convert a JSON string to pandas object.
Notes
-----
The behavior of ``indent=0`` varies from the stdlib, which does not
indent the output but does insert newlines. Currently, ``indent=0``
and the default ``indent=None`` are equivalent in pandas, though this
may change in a future release.
``orient='table'`` contains a 'pandas_version' field under 'schema'.
This stores the version of `pandas` used in the latest revision of the
schema.
Examples
--------
>>> import json
>>> df = pd.DataFrame(
... [["a", "b"], ["c", "d"]],
... index=["row 1", "row 2"],
... columns=["col 1", "col 2"],
... )
>>> result = df.to_json(orient="split")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"columns": [
"col 1",
"col 2"
],
"index": [
"row 1",
"row 2"
],
"data": [
[
"a",
"b"
],
[
"c",
"d"
]
]
}
Encoding/decoding a Dataframe using ``'records'`` formatted JSON.
Note that index labels are not preserved with this encoding.
>>> result = df.to_json(orient="records")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
[
{
"col 1": "a",
"col 2": "b"
},
{
"col 1": "c",
"col 2": "d"
}
]
Encoding/decoding a Dataframe using ``'index'`` formatted JSON:
>>> result = df.to_json(orient="index")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"row 1": {
"col 1": "a",
"col 2": "b"
},
"row 2": {
"col 1": "c",
"col 2": "d"
}
}
Encoding/decoding a Dataframe using ``'columns'`` formatted JSON:
>>> result = df.to_json(orient="columns")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"col 1": {
"row 1": "a",
"row 2": "c"
},
"col 2": {
"row 1": "b",
"row 2": "d"
}
}
Encoding/decoding a Dataframe using ``'values'`` formatted JSON:
>>> result = df.to_json(orient="values")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
[
[
"a",
"b"
],
[
"c",
"d"
]
]
Encoding with Table Schema:
>>> result = df.to_json(orient="table")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"schema": {
"fields": [
{
"name": "index",
"type": "string"
},
{
"name": "col 1",
"type": "string"
},
{
"name": "col 2",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "1.4.0"
},
"data": [
{
"index": "row 1",
"col 1": "a",
"col 2": "b"
},
{
"index": "row 2",
"col 1": "c",
"col 2": "d"
}
]
}
- to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None)
- Render object to a LaTeX tabular, longtable, or nested table.
Requires ``\usepackage{booktabs}``. The output can be copy/pasted
into a main LaTeX document or read from an external file
with ``\input{table.tex}``.
.. versionchanged:: 1.0.0
Added caption and label arguments.
.. versionchanged:: 1.2.0
Added position argument, changed meaning of caption argument.
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : list of label, optional
The subset of columns to write. Writes all columns by default.
col_space : int, optional
The minimum width of each column.
header : bool or list of str, default True
Write out the column names. If a list of strings is given,
it is assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
na_rep : str, default 'NaN'
Missing data representation.
formatters : list of functions or dict of {str: function}, optional
Formatter functions to apply to columns' elements by position or
name. The result of each function must be a unicode string.
List must be of length equal to the number of columns.
float_format : one-parameter function or str, optional, default None
Formatter for floating point numbers. For example
``float_format="%.2f"`` and ``float_format="{:0.2f}".format`` will
both result in 0.1234 being formatted as 0.12.
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row. By default, the value will be
read from the config module.
index_names : bool, default True
Prints the names of the indexes.
bold_rows : bool, default False
Make the row labels bold in the output.
column_format : str, optional
The columns format as specified in `LaTeX table format
<https://en.wikibooks.org/wiki/LaTeX/Tables>`__ e.g. 'rcl' for 3
columns. By default, 'l' will be used for all columns except
columns of numbers, which default to 'r'.
longtable : bool, optional
By default, the value will be read from the pandas config
module. Use a longtable environment instead of tabular. Requires
adding a \usepackage{longtable} to your LaTeX preamble.
escape : bool, optional
By default, the value will be read from the pandas config
module. When set to False prevents from escaping latex special
characters in column names.
encoding : str, optional
A string representing the encoding to use in the output file,
defaults to 'utf-8'.
decimal : str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
multicolumn : bool, default True
Use \multicolumn to enhance MultiIndex columns.
The default will be read from the config module.
multicolumn_format : str, default 'l'
The alignment for multicolumns, similar to `column_format`
The default will be read from the config module.
multirow : bool, default False
Use \multirow to enhance MultiIndex rows. Requires adding a
\usepackage{multirow} to your LaTeX preamble. Will print
centered labels (instead of top-aligned) across the contained
rows, separating groups via clines. The default will be read
from the pandas config module.
caption : str or tuple, optional
Tuple (full_caption, short_caption),
which results in ``\caption[short_caption]{full_caption}``;
if a single string is passed, no short caption will be set.
.. versionadded:: 1.0.0
.. versionchanged:: 1.2.0
Optionally allow caption to be a tuple ``(full_caption, short_caption)``.
label : str, optional
The LaTeX label to be placed inside ``\label{}`` in the output.
This is used with ``\ref{}`` in the main ``.tex`` file.
.. versionadded:: 1.0.0
position : str, optional
The LaTeX positional argument for tables, to be placed after
``\begin{}`` in the output.
.. versionadded:: 1.2.0
Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns
None.
See Also
--------
Styler.to_latex : Render a DataFrame to LaTeX with conditional formatting.
DataFrame.to_string : Render a DataFrame to a console-friendly
tabular output.
DataFrame.to_html : Render a DataFrame as an HTML table.
Examples
--------
>>> df = pd.DataFrame(dict(name=['Raphael', 'Donatello'],
... mask=['red', 'purple'],
... weapon=['sai', 'bo staff']))
>>> print(df.to_latex(index=False)) # doctest: +SKIP
\begin{tabular}{lll}
\toprule
name & mask & weapon \\
\midrule
Raphael & red & sai \\
Donatello & purple & bo staff \\
\bottomrule
\end{tabular}
- to_pickle(self, path, compression: 'CompressionOptions' = 'infer', protocol: 'int' = 5, storage_options: 'StorageOptions' = None) -> 'None'
- Pickle (serialize) object to file.
Parameters
----------
path : str
File path where the pickled object will be stored.
compression : str or dict, default 'infer'
For on-the-fly compression of the output data. If 'infer' and 'path'
path-like, then detect compression from the following extensions: '.gz',
'.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). Set to
``None`` for no compression. Can also be a dict with key ``'method'`` set
to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``} and other
key-value pairs are forwarded to ``zipfile.ZipFile``, ``gzip.GzipFile``,
``bz2.BZ2File``, or ``zstandard.ZstdDecompressor``, respectively. As an
example, the following could be passed for faster compression and to create
a reproducible gzip archive:
``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1]_ paragraph 12.1.2). The possible
values are 0, 1, 2, 3, 4, 5. A negative value for the protocol
parameter is equivalent to setting its value to HIGHEST_PROTOCOL.
.. [1] https://docs.python.org/3/library/pickle.html.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
See Also
--------
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_sql : Write DataFrame to a SQL database.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)}) # doctest: +SKIP
>>> original_df # doctest: +SKIP
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl") # doctest: +SKIP
>>> unpickled_df = pd.read_pickle("./dummy.pkl") # doctest: +SKIP
>>> unpickled_df # doctest: +SKIP
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
- to_sql(self, name: 'str', con, schema=None, if_exists: 'str' = 'fail', index: 'bool_t' = True, index_label=None, chunksize=None, dtype: 'DtypeArg | None' = None, method=None) -> 'int | None'
- Write records stored in a DataFrame to a SQL database.
Databases supported by SQLAlchemy [1]_ are supported. Tables can be
newly created, appended to, or overwritten.
Parameters
----------
name : str
Name of SQL table.
con : sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_.
schema : str, optional
Specify the schema (if database flavor supports this). If None, use
default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
How to behave if the table already exists.
* fail: Raise a ValueError.
* replace: Drop the table before inserting new values.
* append: Insert new values to the existing table.
index : bool, default True
Write DataFrame index as a column. Uses `index_label` as the column
name in the table.
index_label : str or sequence, default None
Column label for index column(s). If None is given (default) and
`index` is True, then the index names are used.
A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, optional
Specify the number of rows in each batch to be written at a time.
By default, all rows will be written at once.
dtype : dict or scalar, optional
Specifying the datatype for columns. If a dictionary is used, the
keys should be the column names and the values should be the
SQLAlchemy types or strings for the sqlite3 legacy mode. If a
scalar is provided, it will be applied to all columns.
method : {None, 'multi', callable}, optional
Controls the SQL insertion clause used:
* None : Uses standard SQL ``INSERT`` clause (one per row).
* 'multi': Pass multiple values in a single ``INSERT`` clause.
* callable with signature ``(pd_table, conn, keys, data_iter)``.
Details and a sample callable implementation can be found in the
section :ref:`insert method <io.sql.method>`.
Returns
-------
None or int
Number of rows affected by to_sql. None is returned if the callable
passed into ``method`` does not return the number of rows.
The number of returned rows affected is the sum of the ``rowcount``
attribute of ``sqlite3.Cursor`` or SQLAlchemy connectable which may not
reflect the exact number of written rows as stipulated in the
`sqlite3 <https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.rowcount>`__ or
`SQLAlchemy <https://docs.sqlalchemy.org/en/14/core/connections.html#sqlalchemy.engine.BaseCursorResult.rowcount>`__.
.. versionadded:: 1.4.0
Raises
------
ValueError
When the table already exists and `if_exists` is 'fail' (the
default).
See Also
--------
read_sql : Read a DataFrame from a table.
Notes
-----
Timezone aware datetime columns will be written as
``Timestamp with timezone`` type with SQLAlchemy if supported by the
database. Otherwise, the datetimes will be stored as timezone unaware
timestamps local to the original timezone.
References
----------
.. [1] https://docs.sqlalchemy.org
.. [2] https://www.python.org/dev/peps/pep-0249/
Examples
--------
Create an in-memory SQLite database.
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)
Create a table from scratch with 3 rows.
>>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> df
name
0 User 1
1 User 2
2 User 3
>>> df.to_sql('users', con=engine)
3
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]
An `sqlalchemy.engine.Connection` can also be passed to `con`:
>>> with engine.begin() as connection:
... df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
... df1.to_sql('users', con=connection, if_exists='append')
2
This is allowed to support operations that require that the same
DBAPI connection is used for the entire operation.
>>> df2 = pd.DataFrame({'name' : ['User 6', 'User 7']})
>>> df2.to_sql('users', con=engine, if_exists='append')
2
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3'),
(0, 'User 4'), (1, 'User 5'), (0, 'User 6'),
(1, 'User 7')]
Overwrite the table with just ``df2``.
>>> df2.to_sql('users', con=engine, if_exists='replace',
... index_label='id')
2
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 6'), (1, 'User 7')]
Specify the dtype (especially useful for integers with missing values).
Notice that while pandas is forced to store the data as floating point,
the database supports nullable integers. When fetching the data with
Python, we get back integer scalars.
>>> df = pd.DataFrame({"A": [1, None, 2]})
>>> df
A
0 1.0
1 NaN
2 2.0
>>> from sqlalchemy.types import Integer
>>> df.to_sql('integers', con=engine, index=False,
... dtype={"A": Integer()})
3
>>> engine.execute("SELECT * FROM integers").fetchall()
[(1,), (None,), (2,)]
- to_xarray(self)
- Return an xarray object from the pandas object.
Returns
-------
xarray.DataArray or xarray.Dataset
Data in the pandas structure converted to Dataset if the object is
a DataFrame, or a DataArray if the object is a Series.
See Also
--------
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
Notes
-----
See the `xarray docs <https://xarray.pydata.org/en/stable/>`__
Examples
--------
>>> df = pd.DataFrame([('falcon', 'bird', 389.0, 2),
... ('parrot', 'bird', 24.0, 2),
... ('lion', 'mammal', 80.5, 4),
... ('monkey', 'mammal', np.nan, 4)],
... columns=['name', 'class', 'max_speed',
... 'num_legs'])
>>> df
name class max_speed num_legs
0 falcon bird 389.0 2
1 parrot bird 24.0 2
2 lion mammal 80.5 4
3 monkey mammal NaN 4
>>> df.to_xarray()
<xarray.Dataset>
Dimensions: (index: 4)
Coordinates:
* index (index) int64 0 1 2 3
Data variables:
name (index) object 'falcon' 'parrot' 'lion' 'monkey'
class (index) object 'bird' 'bird' 'mammal' 'mammal'
max_speed (index) float64 389.0 24.0 80.5 nan
num_legs (index) int64 2 2 4 4
>>> df['max_speed'].to_xarray()
<xarray.DataArray 'max_speed' (index: 4)>
array([389. , 24. , 80.5, nan])
Coordinates:
* index (index) int64 0 1 2 3
>>> dates = pd.to_datetime(['2018-01-01', '2018-01-01',
... '2018-01-02', '2018-01-02'])
>>> df_multiindex = pd.DataFrame({'date': dates,
... 'animal': ['falcon', 'parrot',
... 'falcon', 'parrot'],
... 'speed': [350, 18, 361, 15]})
>>> df_multiindex = df_multiindex.set_index(['date', 'animal'])
>>> df_multiindex
speed
date animal
2018-01-01 falcon 350
parrot 18
2018-01-02 falcon 361
parrot 15
>>> df_multiindex.to_xarray()
<xarray.Dataset>
Dimensions: (animal: 2, date: 2)
Coordinates:
* date (date) datetime64[ns] 2018-01-01 2018-01-02
* animal (animal) object 'falcon' 'parrot'
Data variables:
speed (date, animal) int64 350 18 361 15
- truncate(self: 'NDFrameT', before=None, after=None, axis=None, copy: 'bool_t' = True) -> 'NDFrameT'
- Truncate a Series or DataFrame before and after some index value.
This is a useful shorthand for boolean indexing based on index
values above or below certain thresholds.
Parameters
----------
before : date, str, int
Truncate all rows before this index value.
after : date, str, int
Truncate all rows after this index value.
axis : {0 or 'index', 1 or 'columns'}, optional
Axis to truncate. Truncates the index (rows) by default.
copy : bool, default is True,
Return a copy of the truncated section.
Returns
-------
type of caller
The truncated Series or DataFrame.
See Also
--------
DataFrame.loc : Select a subset of a DataFrame by label.
DataFrame.iloc : Select a subset of a DataFrame by position.
Notes
-----
If the index being truncated contains only datetime values,
`before` and `after` may be specified as strings instead of
Timestamps.
Examples
--------
>>> df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
... 'B': ['f', 'g', 'h', 'i', 'j'],
... 'C': ['k', 'l', 'm', 'n', 'o']},
... index=[1, 2, 3, 4, 5])
>>> df
A B C
1 a f k
2 b g l
3 c h m
4 d i n
5 e j o
>>> df.truncate(before=2, after=4)
A B C
2 b g l
3 c h m
4 d i n
The columns of a DataFrame can be truncated.
>>> df.truncate(before="A", after="B", axis="columns")
A B
1 a f
2 b g
3 c h
4 d i
5 e j
For Series, only rows can be truncated.
>>> df['A'].truncate(before=2, after=4)
2 b
3 c
4 d
Name: A, dtype: object
The index values in ``truncate`` can be datetimes or string
dates.
>>> dates = pd.date_range('2016-01-01', '2016-02-01', freq='s')
>>> df = pd.DataFrame(index=dates, data={'A': 1})
>>> df.tail()
A
2016-01-31 23:59:56 1
2016-01-31 23:59:57 1
2016-01-31 23:59:58 1
2016-01-31 23:59:59 1
2016-02-01 00:00:00 1
>>> df.truncate(before=pd.Timestamp('2016-01-05'),
... after=pd.Timestamp('2016-01-10')).tail()
A
2016-01-09 23:59:56 1
2016-01-09 23:59:57 1
2016-01-09 23:59:58 1
2016-01-09 23:59:59 1
2016-01-10 00:00:00 1
Because the index is a DatetimeIndex containing only dates, we can
specify `before` and `after` as strings. They will be coerced to
Timestamps before truncation.
>>> df.truncate('2016-01-05', '2016-01-10').tail()
A
2016-01-09 23:59:56 1
2016-01-09 23:59:57 1
2016-01-09 23:59:58 1
2016-01-09 23:59:59 1
2016-01-10 00:00:00 1
Note that ``truncate`` assumes a 0 value for any unspecified time
component (midnight). This differs from partial string slicing, which
returns any partially matching dates.
>>> df.loc['2016-01-05':'2016-01-10', :].tail()
A
2016-01-10 23:59:55 1
2016-01-10 23:59:56 1
2016-01-10 23:59:57 1
2016-01-10 23:59:58 1
2016-01-10 23:59:59 1
- tshift(self: 'NDFrameT', periods: 'int' = 1, freq=None, axis: 'Axis' = 0) -> 'NDFrameT'
- Shift the time index, using the index's frequency if available.
.. deprecated:: 1.1.0
Use `shift` instead.
Parameters
----------
periods : int
Number of periods to move, can be positive or negative.
freq : DateOffset, timedelta, or str, default None
Increment to use from the tseries module
or time rule expressed as a string (e.g. 'EOM').
axis : {0 or ‘index’, 1 or ‘columns’, None}, default 0
Corresponds to the axis that contains the Index.
Returns
-------
shifted : Series/DataFrame
Notes
-----
If freq is not specified then tries to use the freq or inferred_freq
attributes of the index. If neither of those attributes exist, a
ValueError is thrown
- tz_convert(self: 'NDFrameT', tz, axis=0, level=None, copy: 'bool_t' = True) -> 'NDFrameT'
- Convert tz-aware axis to target time zone.
Parameters
----------
tz : str or tzinfo object
axis : the axis to convert
level : int, str, default None
If axis is a MultiIndex, convert a specific level. Otherwise
must be None.
copy : bool, default True
Also make a copy of the underlying data.
Returns
-------
{klass}
Object with time zone converted axis.
Raises
------
TypeError
If the axis is tz-naive.
- tz_localize(self: 'NDFrameT', tz, axis=0, level=None, copy: 'bool_t' = True, ambiguous='raise', nonexistent: 'str' = 'raise') -> 'NDFrameT'
- Localize tz-naive index of a Series or DataFrame to target time zone.
This operation localizes the Index. To localize the values in a
timezone-naive Series, use :meth:`Series.dt.tz_localize`.
Parameters
----------
tz : str or tzinfo
axis : the axis to localize
level : int, str, default None
If axis ia a MultiIndex, localize a specific level. Otherwise
must be None.
copy : bool, default True
Also make a copy of the underlying data.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
`ambiguous` parameter dictates how ambiguous times should be
handled.
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : str, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST. Valid values are:
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
Series or DataFrame
Same type as the input.
Raises
------
TypeError
If the TimeSeries is tz-aware and tz is not None.
Examples
--------
Localize local times:
>>> s = pd.Series([1],
... index=pd.DatetimeIndex(['2018-09-15 01:30:00']))
>>> s.tz_localize('CET')
2018-09-15 01:30:00+02:00 1
dtype: int64
Be careful with DST changes. When there is sequential data, pandas
can infer the DST time:
>>> s = pd.Series(range(7),
... index=pd.DatetimeIndex(['2018-10-28 01:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 03:00:00',
... '2018-10-28 03:30:00']))
>>> s.tz_localize('CET', ambiguous='infer')
2018-10-28 01:30:00+02:00 0
2018-10-28 02:00:00+02:00 1
2018-10-28 02:30:00+02:00 2
2018-10-28 02:00:00+01:00 3
2018-10-28 02:30:00+01:00 4
2018-10-28 03:00:00+01:00 5
2018-10-28 03:30:00+01:00 6
dtype: int64
In some cases, inferring the DST is impossible. In such cases, you can
pass an ndarray to the ambiguous parameter to set the DST explicitly
>>> s = pd.Series(range(3),
... index=pd.DatetimeIndex(['2018-10-28 01:20:00',
... '2018-10-28 02:36:00',
... '2018-10-28 03:46:00']))
>>> s.tz_localize('CET', ambiguous=np.array([True, True, False]))
2018-10-28 01:20:00+02:00 0
2018-10-28 02:36:00+02:00 1
2018-10-28 03:46:00+01:00 2
dtype: int64
If the DST transition causes nonexistent times, you can shift these
dates forward or backward with a timedelta object or `'shift_forward'`
or `'shift_backward'`.
>>> s = pd.Series(range(2),
... index=pd.DatetimeIndex(['2015-03-29 02:30:00',
... '2015-03-29 03:30:00']))
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
2015-03-29 03:00:00+02:00 0
2015-03-29 03:30:00+02:00 1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
2015-03-29 01:59:59.999999999+01:00 0
2015-03-29 03:30:00+02:00 1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1H'))
2015-03-29 03:30:00+02:00 0
2015-03-29 03:30:00+02:00 1
dtype: int64
- xs(self, key, axis=0, level=None, drop_level: 'bool_t' = True)
- Return cross-section from the Series/DataFrame.
This method takes a `key` argument to select data at a particular
level of a MultiIndex.
Parameters
----------
key : label or tuple of label
Label contained in the index, or partially in a MultiIndex.
axis : {0 or 'index', 1 or 'columns'}, default 0
Axis to retrieve cross-section on.
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate
which levels are used. Levels can be referred by label or position.
drop_level : bool, default True
If False, returns object with same levels as self.
Returns
-------
Series or DataFrame
Cross-section from the original Series or DataFrame
corresponding to the selected index levels.
See Also
--------
DataFrame.loc : Access a group of rows and columns
by label(s) or a boolean array.
DataFrame.iloc : Purely integer-location based indexing
for selection by position.
Notes
-----
`xs` can not be used to set values.
MultiIndex Slicers is a generic way to get/set values on
any level or levels.
It is a superset of `xs` functionality, see
:ref:`MultiIndex Slicers <advanced.mi_slicers>`.
Examples
--------
>>> d = {'num_legs': [4, 4, 2, 2],
... 'num_wings': [0, 0, 2, 2],
... 'class': ['mammal', 'mammal', 'mammal', 'bird'],
... 'animal': ['cat', 'dog', 'bat', 'penguin'],
... 'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = pd.DataFrame(data=d)
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df
num_legs num_wings
class animal locomotion
mammal cat walks 4 0
dog walks 4 0
bat flies 2 2
bird penguin walks 2 2
Get values at specified index
>>> df.xs('mammal')
num_legs num_wings
animal locomotion
cat walks 4 0
dog walks 4 0
bat flies 2 2
Get values at several indexes
>>> df.xs(('mammal', 'dog'))
num_legs num_wings
locomotion
walks 4 0
Get values at specified index and level
>>> df.xs('cat', level=1)
num_legs num_wings
class locomotion
mammal walks 4 0
Get values at several indexes and levels
>>> df.xs(('bird', 'walks'),
... level=[0, 'locomotion'])
num_legs num_wings
animal
penguin 2 2
Get values at specified column and axis
>>> df.xs('num_wings', axis=1)
class animal locomotion
mammal cat walks 0
dog walks 0
bat flies 2
bird penguin walks 2
Name: num_wings, dtype: int64
Readonly properties inherited from pandas.core.generic.NDFrame:
- dtypes
- Return the dtypes in the DataFrame.
This returns a Series with the data type of each column.
The result's index is the original DataFrame's columns. Columns
with mixed types are stored with the ``object`` dtype. See
:ref:`the User Guide <basics.dtypes>` for more.
Returns
-------
pandas.Series
The data type of each column.
Examples
--------
>>> df = pd.DataFrame({'float': [1.0],
... 'int': [1],
... 'datetime': [pd.Timestamp('20180310')],
... 'string': ['foo']})
>>> df.dtypes
float float64
int int64
datetime datetime64[ns]
string object
dtype: object
- empty
- Indicator whether Series/DataFrame is empty.
True if Series/DataFrame is entirely empty (no items), meaning any of the
axes are of length 0.
Returns
-------
bool
If Series/DataFrame is empty, return True, if not return False.
See Also
--------
Series.dropna : Return series without null values.
DataFrame.dropna : Return DataFrame with labels on given axis omitted
where (all or any) data are missing.
Notes
-----
If Series/DataFrame contains only NaNs, it is still not considered empty. See
the example below.
Examples
--------
An example of an actual empty DataFrame. Notice the index is empty:
>>> df_empty = pd.DataFrame({'A' : []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True
If we only have NaNs in our DataFrame, it is not considered empty! We
will need to drop the NaNs to make the DataFrame empty:
>>> df = pd.DataFrame({'A' : [np.nan]})
>>> df
A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True
>>> ser_empty = pd.Series({'A' : []})
>>> ser_empty
A []
dtype: object
>>> ser_empty.empty
False
>>> ser_empty = pd.Series()
>>> ser_empty.empty
True
- flags
- Get the properties associated with this pandas object.
The available flags are
* :attr:`Flags.allows_duplicate_labels`
See Also
--------
Flags : Flags that apply to pandas objects.
DataFrame.attrs : Global metadata applying to this dataset.
Notes
-----
"Flags" differ from "metadata". Flags reflect properties of the
pandas object (the Series or DataFrame). Metadata refer to properties
of the dataset, and should be stored in :attr:`DataFrame.attrs`.
Examples
--------
>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>
Flags can be get or set using ``.``
>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False
Or by slicing with a key
>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True
- ndim
- Return an int representing the number of axes / array dimensions.
Return 1 if Series. Otherwise return 2 if DataFrame.
See Also
--------
ndarray.ndim : Number of array dimensions.
Examples
--------
>>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.ndim
1
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.ndim
2
- size
- Return an int representing the number of elements in this object.
Return the number of rows if Series. Otherwise return the number of
rows times number of columns if DataFrame.
See Also
--------
ndarray.size : Number of elements in the array.
Examples
--------
>>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.size
4
Data descriptors inherited from pandas.core.generic.NDFrame:
- attrs
- Dictionary of global attributes of this dataset.
.. warning::
attrs is experimental and may change without warning.
See Also
--------
DataFrame.flags : Global flags applying to this object.
Data and other attributes inherited from pandas.core.generic.NDFrame:
- __array_priority__ = 1000
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
Data descriptors inherited from pandas.core.accessor.DirNamesMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Readonly properties inherited from pandas.core.indexing.IndexingMixin:
- at
- Access a single value for a row/column label pair.
Similar to ``loc``, in that both provide label-based lookups. Use
``at`` if you only need to get or set a single value in a DataFrame
or Series.
Raises
------
KeyError
If 'label' does not exist in DataFrame.
See Also
--------
DataFrame.iat : Access a single value for a row/column pair by integer
position.
DataFrame.loc : Access a group of rows and columns by label(s).
Series.at : Access a single value using a label.
Examples
--------
>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df
A B C
4 0 2 3
5 0 4 1
6 10 20 30
Get value at specified row/column pair
>>> df.at[4, 'B']
2
Set value at specified row/column pair
>>> df.at[4, 'B'] = 10
>>> df.at[4, 'B']
10
Get value within a Series
>>> df.loc[5].at['B']
4
- iat
- Access a single value for a row/column pair by integer position.
Similar to ``iloc``, in that both provide integer-based lookups. Use
``iat`` if you only need to get or set a single value in a DataFrame
or Series.
Raises
------
IndexError
When integer position is out of bounds.
See Also
--------
DataFrame.at : Access a single value for a row/column label pair.
DataFrame.loc : Access a group of rows and columns by label(s).
DataFrame.iloc : Access a group of rows and columns by integer position(s).
Examples
--------
>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... columns=['A', 'B', 'C'])
>>> df
A B C
0 0 2 3
1 0 4 1
2 10 20 30
Get value at specified row/column pair
>>> df.iat[1, 2]
1
Set value at specified row/column pair
>>> df.iat[1, 2] = 10
>>> df.iat[1, 2]
10
Get value within a series
>>> df.loc[0].iat[1]
2
- iloc
- Purely integer-location based indexing for selection by position.
``.iloc[]`` is primarily integer position based (from ``0`` to
``length-1`` of the axis), but may also be used with a boolean
array.
Allowed inputs are:
- An integer, e.g. ``5``.
- A list or array of integers, e.g. ``[4, 3, 0]``.
- A slice object with ints, e.g. ``1:7``.
- A boolean array.
- A ``callable`` function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above).
This is useful in method chains, when you don't have a reference to the
calling object, but would like to base your selection on some value.
``.iloc`` will raise ``IndexError`` if a requested indexer is
out-of-bounds, except *slice* indexers which allow out-of-bounds
indexing (this conforms with python/numpy *slice* semantics).
See more at :ref:`Selection by Position <indexing.integer>`.
See Also
--------
DataFrame.iat : Fast integer location scalar accessor.
DataFrame.loc : Purely label-location based indexer for selection by label.
Series.iloc : Purely integer-location based indexing for
selection by position.
Examples
--------
>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
... {'a': 100, 'b': 200, 'c': 300, 'd': 400},
... {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
**Indexing just the rows**
With a scalar integer.
>>> type(df.iloc[0])
<class 'pandas.core.series.Series'>
>>> df.iloc[0]
a 1
b 2
c 3
d 4
Name: 0, dtype: int64
With a list of integers.
>>> df.iloc[[0]]
a b c d
0 1 2 3 4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
>>> df.iloc[[0, 1]]
a b c d
0 1 2 3 4
1 100 200 300 400
With a `slice` object.
>>> df.iloc[:3]
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
With a boolean mask the same length as the index.
>>> df.iloc[[True, False, True]]
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
With a callable, useful in method chains. The `x` passed
to the ``lambda`` is the DataFrame being sliced. This selects
the rows whose index label even.
>>> df.iloc[lambda x: x.index % 2 == 0]
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
**Indexing both axes**
You can mix the indexer types for the index and columns. Use ``:`` to
select the entire axis.
With scalar integers.
>>> df.iloc[0, 1]
2
With lists of integers.
>>> df.iloc[[0, 2], [1, 3]]
b d
0 2 4
2 2000 4000
With `slice` objects.
>>> df.iloc[1:3, 0:3]
a b c
1 100 200 300
2 1000 2000 3000
With a boolean array whose length matches the columns.
>>> df.iloc[:, [True, False, True, False]]
a c
0 1 3
1 100 300
2 1000 3000
With a callable function that expects the Series or DataFrame.
>>> df.iloc[:, lambda df: [0, 2]]
a c
0 1 3
1 100 300
2 1000 3000
- loc
- Access a group of rows and columns by label(s) or a boolean array.
``.loc[]`` is primarily label based, but may also be used with a
boolean array.
Allowed inputs are:
- A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
interpreted as a *label* of the index, and **never** as an
integer position along the index).
- A list or array of labels, e.g. ``['a', 'b', 'c']``.
- A slice object with labels, e.g. ``'a':'f'``.
.. warning:: Note that contrary to usual python slices, **both** the
start and the stop are included
- A boolean array of the same length as the axis being sliced,
e.g. ``[True, False, True]``.
- An alignable boolean Series. The index of the key will be aligned before
masking.
- An alignable Index. The Index of the returned selection will be the input.
- A ``callable`` function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above)
See more at :ref:`Selection by Label <indexing.label>`.
Raises
------
KeyError
If any items are not found.
IndexingError
If an indexed key is passed and its index is unalignable to the frame index.
See Also
--------
DataFrame.at : Access a single value for a row/column label pair.
DataFrame.iloc : Access group of rows and columns by integer position(s).
DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the
Series/DataFrame.
Series.loc : Access group of values using labels.
Examples
--------
**Getting values**
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
Single label. Note this returns the row as a Series.
>>> df.loc['viper']
max_speed 4
shield 5
Name: viper, dtype: int64
List of labels. Note using ``[[]]`` returns a DataFrame.
>>> df.loc[['viper', 'sidewinder']]
max_speed shield
viper 4 5
sidewinder 7 8
Single label for row and column
>>> df.loc['cobra', 'shield']
2
Slice with labels for row and single label for column. As mentioned
above, note that both the start and stop of the slice are included.
>>> df.loc['cobra':'viper', 'max_speed']
cobra 1
viper 4
Name: max_speed, dtype: int64
Boolean list with the same length as the row axis
>>> df.loc[[False, False, True]]
max_speed shield
sidewinder 7 8
Alignable boolean Series:
>>> df.loc[pd.Series([False, True, False],
... index=['viper', 'sidewinder', 'cobra'])]
max_speed shield
sidewinder 7 8
Index (same behavior as ``df.reindex``)
>>> df.loc[pd.Index(["cobra", "viper"], name="foo")]
max_speed shield
foo
cobra 1 2
viper 4 5
Conditional that returns a boolean Series
>>> df.loc[df['shield'] > 6]
max_speed shield
sidewinder 7 8
Conditional that returns a boolean Series with column labels specified
>>> df.loc[df['shield'] > 6, ['max_speed']]
max_speed
sidewinder 7
Callable that returns a boolean Series
>>> df.loc[lambda df: df['shield'] == 8]
max_speed shield
sidewinder 7 8
**Setting values**
Set value for all items matching the list of labels
>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
max_speed shield
cobra 1 2
viper 4 50
sidewinder 7 50
Set value for an entire row
>>> df.loc['cobra'] = 10
>>> df
max_speed shield
cobra 10 10
viper 4 50
sidewinder 7 50
Set value for an entire column
>>> df.loc[:, 'max_speed'] = 30
>>> df
max_speed shield
cobra 30 10
viper 30 50
sidewinder 30 50
Set value for rows matching callable condition
>>> df.loc[df['shield'] > 35] = 0
>>> df
max_speed shield
cobra 30 10
viper 0 0
sidewinder 0 0
**Getting values on a DataFrame with an index that has integer labels**
Another example using integers for the index
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
max_speed shield
7 1 2
8 4 5
9 7 8
Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.
>>> df.loc[7:9]
max_speed shield
7 1 2
8 4 5
9 7 8
**Getting values with a MultiIndex**
A number of examples using a DataFrame with a MultiIndex
>>> tuples = [
... ('cobra', 'mark i'), ('cobra', 'mark ii'),
... ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
... ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
... [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
Single label. Note this returns a DataFrame with a single index.
>>> df.loc['cobra']
max_speed shield
mark i 12 2
mark ii 0 4
Single index tuple. Note this returns a Series.
>>> df.loc[('cobra', 'mark ii')]
max_speed 0
shield 4
Name: (cobra, mark ii), dtype: int64
Single label for row and column. Similar to passing in a tuple, this
returns a Series.
>>> df.loc['cobra', 'mark i']
max_speed 12
shield 2
Name: (cobra, mark i), dtype: int64
Single tuple. Note using ``[[]]`` returns a DataFrame.
>>> df.loc[[('cobra', 'mark ii')]]
max_speed shield
cobra mark ii 0 4
Single tuple for the index with a single label for the column
>>> df.loc[('cobra', 'mark i'), 'shield']
2
Slice from index tuple to single label
>>> df.loc[('cobra', 'mark i'):'viper']
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
Slice from index tuple to index tuple
>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __and__(self, other)
- __eq__(self, other)
- Return self==value.
- __floordiv__(self, other)
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __or__(self, other)
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
- __xor__(self, other)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
|
class DateOffset(RelativeDeltaOffset) |
| |
Standard kind of date increment used for a date range.
Works exactly like the keyword argument form of relativedelta.
Note that the positional argument form of relativedelata is not
supported. Use of the keyword n is discouraged-- you would be better
off specifying n in the keywords you use, but regardless it is
there for you. n is needed for DateOffset subclasses.
DateOffset works as follows. Each offset specify a set of dates
that conform to the DateOffset. For example, Bday defines this
set to be the set of dates that are weekdays (M-F). To test if a
date is in the set of a DateOffset dateOffset we can use the
is_on_offset method: dateOffset.is_on_offset(date).
If a date is not on a valid date, the rollback and rollforward
methods can be used to roll the date to the nearest valid date
before/after the date.
DateOffsets can be created to move dates forward a given number of
valid dates. For example, Bday(2) can be added to a date to move
it two business days forward. If the date does not start on a
valid date, first it is moved to a valid date. Thus pseudo code
is:
def __add__(date):
date = rollback(date) # does nothing if date is valid
return date + <n number of periods>
When a date offset is created for a negative number of periods,
the date is first rolled forward. The pseudo code is:
def __add__(date):
date = rollforward(date) # does nothing is date is valid
return date + <n number of periods>
Zero presents a problem. Should it roll forward or back? We
arbitrarily have it rollforward:
date + BDay(0) == BDay.rollforward(date)
Since 0 is a bit weird, we suggest avoiding its use.
Parameters
----------
n : int, default 1
The number of time periods the offset represents.
If specified without a temporal pattern, defaults to n days.
normalize : bool, default False
Whether to round the result of a DateOffset addition down to the
previous midnight.
**kwds
Temporal parameter that add to or replace the offset value.
Parameters that **add** to the offset (like Timedelta):
- years
- months
- weeks
- days
- hours
- minutes
- seconds
- microseconds
- nanoseconds
Parameters that **replace** the offset value:
- year
- month
- day
- weekday
- hour
- minute
- second
- microsecond
- nanosecond.
See Also
--------
dateutil.relativedelta.relativedelta : The relativedelta type is designed
to be applied to an existing datetime an can replace specific components of
that datetime, or represents an interval of time.
Examples
--------
>>> from pandas.tseries.offsets import DateOffset
>>> ts = pd.Timestamp('2017-01-01 09:10:11')
>>> ts + DateOffset(months=3)
Timestamp('2017-04-01 09:10:11')
>>> ts = pd.Timestamp('2017-01-01 09:10:11')
>>> ts + DateOffset(months=2)
Timestamp('2017-03-01 09:10:11') |
| |
- Method resolution order:
- DateOffset
- RelativeDeltaOffset
- BaseOffset
- builtins.object
Methods defined here:
- __setattr__(self, name, value)
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Methods inherited from RelativeDeltaOffset:
- __getstate__(...)
- Return a pickleable state
- __init__(self, /, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- __reduce_cython__(...)
- __setstate__(...)
- Reconstruct an instance from a pickled state
- __setstate_cython__(...)
- apply_index = wrapper(self, other)
- is_on_offset(...)
Static methods inherited from RelativeDeltaOffset:
- __new__(*args, **kwargs) from builtins.type
- Create and return a new object. See help(type) for accurate signature.
Methods inherited from BaseOffset:
- __add__(self, value, /)
- Return self+value.
- __call__(self, /, *args, **kwargs)
- Call self as a function.
- __eq__(self, value, /)
- Return self==value.
- __ge__(self, value, /)
- Return self>=value.
- __gt__(self, value, /)
- Return self>value.
- __hash__(self, /)
- Return hash(self).
- __le__(self, value, /)
- Return self<=value.
- __lt__(self, value, /)
- Return self<value.
- __mul__(self, value, /)
- Return self*value.
- __ne__(self, value, /)
- Return self!=value.
- __neg__(self, /)
- -self
- __radd__(self, value, /)
- Return value+self.
- __repr__(self, /)
- Return repr(self).
- __rmul__(self, value, /)
- Return value*self.
- __rsub__(self, value, /)
- Return value-self.
- __sub__(self, value, /)
- Return self-value.
- apply(...)
- copy(...)
- isAnchored(...)
- is_anchored(...)
- is_month_end(...)
- is_month_start(...)
- is_quarter_end(...)
- is_quarter_start(...)
- is_year_end(...)
- is_year_start(...)
- onOffset(...)
- rollback(...)
- Roll provided date backward to next offset only if not on offset.
Returns
-------
TimeStamp
Rolled timestamp if not on offset, otherwise unchanged timestamp.
- rollforward(...)
- Roll provided date forward to next offset only if not on offset.
Returns
-------
TimeStamp
Rolled timestamp if not on offset, otherwise unchanged timestamp.
Data descriptors inherited from BaseOffset:
- base
- Returns a copy of the calling offset object with n=1 and all other
attributes equal.
- freqstr
- kwds
- n
- name
- nanos
- normalize
- rule_code
Data and other attributes inherited from BaseOffset:
- __array_priority__ = 1000
|
class DatetimeIndex(pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin) |
| |
DatetimeIndex(data=None, freq=<no_default>, tz=None, normalize: 'bool' = False, closed=None, ambiguous='raise', dayfirst: 'bool' = False, yearfirst: 'bool' = False, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None) -> 'DatetimeIndex'
Immutable ndarray-like of datetime64 data.
Represented internally as int64, and which can be boxed to Timestamp objects
that are subclasses of datetime and carry metadata.
Parameters
----------
data : array-like (1-dimensional), optional
Optional datetime-like data to construct index with.
freq : str or pandas offset object, optional
One of pandas date offset strings or corresponding objects. The string
'infer' can be passed in order to set the frequency of the index as the
inferred frequency upon creation.
tz : pytz.timezone or dateutil.tz.tzfile or datetime.tzinfo or str
Set the Timezone of the data.
normalize : bool, default False
Normalize start/end dates to midnight before generating date range.
closed : {'left', 'right'}, optional
Set whether to include `start` and `end` that are on the
boundary. The default includes boundary points on either end.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from 03:00
DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC
and at 01:30:00 UTC. In such a situation, the `ambiguous` parameter
dictates how ambiguous times should be handled.
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False signifies a
non-DST time (note that this flag is only applicable for ambiguous
times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous times.
dayfirst : bool, default False
If True, parse dates in `data` with the day first order.
yearfirst : bool, default False
If True parse dates in `data` with the year first order.
dtype : numpy.dtype or DatetimeTZDtype or str, default None
Note that the only NumPy dtype allowed is ‘datetime64[ns]’.
copy : bool, default False
Make a copy of input ndarray.
name : label, default None
Name to be stored in the index.
Attributes
----------
year
month
day
hour
minute
second
microsecond
nanosecond
date
time
timetz
dayofyear
day_of_year
weekofyear
week
dayofweek
day_of_week
weekday
quarter
tz
freq
freqstr
is_month_start
is_month_end
is_quarter_start
is_quarter_end
is_year_start
is_year_end
is_leap_year
inferred_freq
Methods
-------
normalize
strftime
snap
tz_convert
tz_localize
round
floor
ceil
to_period
to_perioddelta
to_pydatetime
to_series
to_frame
month_name
day_name
mean
std
See Also
--------
Index : The base pandas Index type.
TimedeltaIndex : Index of timedelta64 data.
PeriodIndex : Index of Period data.
to_datetime : Convert argument to datetime.
date_range : Create a fixed-frequency DatetimeIndex.
Notes
-----
To learn more about the frequency strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__. |
| |
- Method resolution order:
- DatetimeIndex
- pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin
- pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin
- pandas.core.indexes.extension.NDArrayBackedExtensionIndex
- pandas.core.indexes.extension.ExtensionIndex
- pandas.core.indexes.base.Index
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- builtins.object
Methods defined here:
- __reduce__(self)
- Helper for pickle.
- ceil(self, *args, **kwargs)
- Perform ceil operation on the data to the specified `freq`.
Parameters
----------
freq : str or Offset
The frequency level to ceil the index to. Must be a fixed
frequency like 'S' (second) not 'ME' (month end). See
:ref:`frequency aliases <timeseries.offset_aliases>` for
a list of possible `freq` values.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
Only relevant for DatetimeIndex:
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
DatetimeIndex, TimedeltaIndex, or Series
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
Raises
------
ValueError if the `freq` cannot be converted.
Notes
-----
If the timestamps have a timezone, ceiling will take place relative to the
local ("wall") time and re-localized to the same timezone. When ceiling
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
**DatetimeIndex**
>>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='T')
>>> rng.ceil('H')
DatetimeIndex(['2018-01-01 12:00:00', '2018-01-01 12:00:00',
'2018-01-01 13:00:00'],
dtype='datetime64[ns]', freq=None)
**Series**
>>> pd.Series(rng).dt.ceil("H")
0 2018-01-01 12:00:00
1 2018-01-01 12:00:00
2 2018-01-01 13:00:00
dtype: datetime64[ns]
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> rng_tz = pd.DatetimeIndex(["2021-10-31 01:30:00"], tz="Europe/Amsterdam")
>>> rng_tz.ceil("H", ambiguous=False)
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
>>> rng_tz.ceil("H", ambiguous=True)
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
- day_name(self, *args, **kwargs)
- Return the day names of the :class:`~pandas.Series` or
:class:`~pandas.DatetimeIndex` with specified locale.
Parameters
----------
locale : str, optional
Locale determining the language in which to return the day name.
Default is English locale.
Returns
-------
Series or Index
Series or Index of day names.
Examples
--------
>>> s = pd.Series(pd.date_range(start='2018-01-01', freq='D', periods=3))
>>> s
0 2018-01-01
1 2018-01-02
2 2018-01-03
dtype: datetime64[ns]
>>> s.dt.day_name()
0 Monday
1 Tuesday
2 Wednesday
dtype: object
>>> idx = pd.date_range(start='2018-01-01', freq='D', periods=3)
>>> idx
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'],
dtype='datetime64[ns]', freq='D')
>>> idx.day_name()
Index(['Monday', 'Tuesday', 'Wednesday'], dtype='object')
- floor(self, *args, **kwargs)
- Perform floor operation on the data to the specified `freq`.
Parameters
----------
freq : str or Offset
The frequency level to floor the index to. Must be a fixed
frequency like 'S' (second) not 'ME' (month end). See
:ref:`frequency aliases <timeseries.offset_aliases>` for
a list of possible `freq` values.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
Only relevant for DatetimeIndex:
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
DatetimeIndex, TimedeltaIndex, or Series
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
Raises
------
ValueError if the `freq` cannot be converted.
Notes
-----
If the timestamps have a timezone, flooring will take place relative to the
local ("wall") time and re-localized to the same timezone. When flooring
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
**DatetimeIndex**
>>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='T')
>>> rng.floor('H')
DatetimeIndex(['2018-01-01 11:00:00', '2018-01-01 12:00:00',
'2018-01-01 12:00:00'],
dtype='datetime64[ns]', freq=None)
**Series**
>>> pd.Series(rng).dt.floor("H")
0 2018-01-01 11:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
dtype: datetime64[ns]
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> rng_tz = pd.DatetimeIndex(["2021-10-31 03:30:00"], tz="Europe/Amsterdam")
>>> rng_tz.floor("2H", ambiguous=False)
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
>>> rng_tz.floor("2H", ambiguous=True)
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
- get_loc(self, key, method=None, tolerance=None)
- Get integer location for requested label
Returns
-------
loc : int
- get_slice_bound(self, label, side: "Literal[('left', 'right')]", kind=<no_default>) -> 'int'
- Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position
of given label.
Parameters
----------
label : object
side : {'left', 'right'}
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
int
Index of label.
- indexer_at_time(self, time, asof: 'bool' = False) -> 'npt.NDArray[np.intp]'
- Return index locations of values at particular time of day
(e.g. 9:30AM).
Parameters
----------
time : datetime.time or str
Time passed in either as object (datetime.time) or as string in
appropriate format ("%H:%M", "%H%M", "%I:%M%p", "%I%M%p",
"%H:%M:%S", "%H%M%S", "%I:%M:%S%p", "%I%M%S%p").
Returns
-------
np.ndarray[np.intp]
See Also
--------
indexer_between_time : Get index locations of values between particular
times of day.
DataFrame.at_time : Select values at particular time of day.
- indexer_between_time(self, start_time, end_time, include_start: 'bool' = True, include_end: 'bool' = True) -> 'npt.NDArray[np.intp]'
- Return index locations of values between particular times of day
(e.g., 9:00-9:30AM).
Parameters
----------
start_time, end_time : datetime.time, str
Time passed either as object (datetime.time) or as string in
appropriate format ("%H:%M", "%H%M", "%I:%M%p", "%I%M%p",
"%H:%M:%S", "%H%M%S", "%I:%M:%S%p","%I%M%S%p").
include_start : bool, default True
include_end : bool, default True
Returns
-------
np.ndarray[np.intp]
See Also
--------
indexer_at_time : Get index locations of values at particular time of day.
DataFrame.between_time : Select values between particular times of day.
- isocalendar(self) -> 'DataFrame'
- Calculate year, week, and day according to the ISO 8601 standard.
.. versionadded:: 1.1.0
Returns
-------
DataFrame
With columns year, week and day.
See Also
--------
Timestamp.isocalendar : Function return a 3-tuple containing ISO year,
week number, and weekday for the given Timestamp object.
datetime.date.isocalendar : Return a named tuple object with
three components: year, week and weekday.
Examples
--------
>>> idx = pd.date_range(start='2019-12-29', freq='D', periods=4)
>>> idx.isocalendar()
year week day
2019-12-29 2019 52 7
2019-12-30 2020 1 1
2019-12-31 2020 1 2
2020-01-01 2020 1 3
>>> idx.isocalendar().week
2019-12-29 52
2019-12-30 1
2019-12-31 1
2020-01-01 1
Freq: D, Name: week, dtype: UInt32
- month_name(self, *args, **kwargs)
- Return the month names of the :class:`~pandas.Series` or
:class:`~pandas.DatetimeIndex` with specified locale.
Parameters
----------
locale : str, optional
Locale determining the language in which to return the month name.
Default is English locale.
Returns
-------
Series or Index
Series or Index of month names.
Examples
--------
>>> s = pd.Series(pd.date_range(start='2018-01', freq='M', periods=3))
>>> s
0 2018-01-31
1 2018-02-28
2 2018-03-31
dtype: datetime64[ns]
>>> s.dt.month_name()
0 January
1 February
2 March
dtype: object
>>> idx = pd.date_range(start='2018-01', freq='M', periods=3)
>>> idx
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31'],
dtype='datetime64[ns]', freq='M')
>>> idx.month_name()
Index(['January', 'February', 'March'], dtype='object')
- normalize(self, *args, **kwargs)
- Convert times to midnight.
The time component of the date-time is converted to midnight i.e.
00:00:00. This is useful in cases, when the time does not matter.
Length is unaltered. The timezones are unaffected.
This method is available on Series with datetime values under
the ``.dt`` accessor, and directly on Datetime Array/Index.
Returns
-------
DatetimeArray, DatetimeIndex or Series
The same type as the original data. Series will have the same
name and index. DatetimeIndex will have the same name.
See Also
--------
floor : Floor the datetimes to the specified freq.
ceil : Ceil the datetimes to the specified freq.
round : Round the datetimes to the specified freq.
Examples
--------
>>> idx = pd.date_range(start='2014-08-01 10:00', freq='H',
... periods=3, tz='Asia/Calcutta')
>>> idx
DatetimeIndex(['2014-08-01 10:00:00+05:30',
'2014-08-01 11:00:00+05:30',
'2014-08-01 12:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq='H')
>>> idx.normalize()
DatetimeIndex(['2014-08-01 00:00:00+05:30',
'2014-08-01 00:00:00+05:30',
'2014-08-01 00:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq=None)
- round(self, *args, **kwargs)
- Perform round operation on the data to the specified `freq`.
Parameters
----------
freq : str or Offset
The frequency level to round the index to. Must be a fixed
frequency like 'S' (second) not 'ME' (month end). See
:ref:`frequency aliases <timeseries.offset_aliases>` for
a list of possible `freq` values.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
Only relevant for DatetimeIndex:
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
DatetimeIndex, TimedeltaIndex, or Series
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
Raises
------
ValueError if the `freq` cannot be converted.
Notes
-----
If the timestamps have a timezone, rounding will take place relative to the
local ("wall") time and re-localized to the same timezone. When rounding
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
**DatetimeIndex**
>>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='T')
>>> rng.round('H')
DatetimeIndex(['2018-01-01 12:00:00', '2018-01-01 12:00:00',
'2018-01-01 12:00:00'],
dtype='datetime64[ns]', freq=None)
**Series**
>>> pd.Series(rng).dt.round("H")
0 2018-01-01 12:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
dtype: datetime64[ns]
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> rng_tz = pd.DatetimeIndex(["2021-10-31 03:30:00"], tz="Europe/Amsterdam")
>>> rng_tz.floor("2H", ambiguous=False)
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
>>> rng_tz.floor("2H", ambiguous=True)
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
- slice_indexer(self, start=None, end=None, step=None, kind=<no_default>)
- Return indexer for specified label slice.
Index.slice_indexer, customized to handle time slicing.
In addition to functionality provided by Index.slice_indexer, does the
following:
- if both `start` and `end` are instances of `datetime.time`, it
invokes `indexer_between_time`
- if `start` and `end` are both either string or None perform
value-based selection in non-monotonic cases.
- snap(self, freq='S') -> 'DatetimeIndex'
- Snap time stamps to nearest occurring frequency.
Returns
-------
DatetimeIndex
- std(self, *args, **kwargs)
- Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
Parameters
----------
axis : int optional, default None
Axis for the function to be applied on.
ddof : int, default 1
Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be
NA.
Returns
-------
Timedelta
- strftime(self, *args, **kwargs)
- Convert to Index using specified date_format.
Return an Index of formatted strings specified by date_format, which
supports the same string format as the python standard library. Details
of the string format can be found in `python string format
doc <https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior>`__.
Parameters
----------
date_format : str
Date format string (e.g. "%Y-%m-%d").
Returns
-------
ndarray[object]
NumPy ndarray of formatted strings.
See Also
--------
to_datetime : Convert the given argument to datetime.
DatetimeIndex.normalize : Return DatetimeIndex with times to midnight.
DatetimeIndex.round : Round the DatetimeIndex to the specified freq.
DatetimeIndex.floor : Floor the DatetimeIndex to the specified freq.
Examples
--------
>>> rng = pd.date_range(pd.Timestamp("2018-03-10 09:00"),
... periods=3, freq='s')
>>> rng.strftime('%B %d, %Y, %r')
Index(['March 10, 2018, 09:00:00 AM', 'March 10, 2018, 09:00:01 AM',
'March 10, 2018, 09:00:02 AM'],
dtype='object')
- to_julian_date(self) -> 'Float64Index'
- Convert Datetime Array to float64 ndarray of Julian Dates.
0 Julian date is noon January 1, 4713 BC.
https://en.wikipedia.org/wiki/Julian_day
- to_period(self, *args, **kwargs)
- Cast to PeriodArray/Index at a particular frequency.
Converts DatetimeArray/Index to PeriodArray/Index.
Parameters
----------
freq : str or Offset, optional
One of pandas' :ref:`offset strings <timeseries.offset_aliases>`
or an Offset object. Will be inferred by default.
Returns
-------
PeriodArray/Index
Raises
------
ValueError
When converting a DatetimeArray/Index with non-regular values,
so that a frequency cannot be inferred.
See Also
--------
PeriodIndex: Immutable ndarray holding ordinal values.
DatetimeIndex.to_pydatetime: Return DatetimeIndex as object.
Examples
--------
>>> df = pd.DataFrame({"y": [1, 2, 3]},
... index=pd.to_datetime(["2000-03-31 00:00:00",
... "2000-05-31 00:00:00",
... "2000-08-31 00:00:00"]))
>>> df.index.to_period("M")
PeriodIndex(['2000-03', '2000-05', '2000-08'],
dtype='period[M]')
Infer the daily frequency
>>> idx = pd.date_range("2017-01-01", periods=2)
>>> idx.to_period()
PeriodIndex(['2017-01-01', '2017-01-02'],
dtype='period[D]')
- to_perioddelta(self, freq) -> 'TimedeltaIndex'
- Calculate TimedeltaArray of difference between index
values and index converted to PeriodArray at specified
freq. Used for vectorized offsets.
Parameters
----------
freq : Period frequency
Returns
-------
TimedeltaArray/Index
- to_pydatetime(self, *args, **kwargs)
- Return Datetime Array/Index as object ndarray of datetime.datetime
objects.
Returns
-------
datetimes : ndarray[object]
- to_series(self, keep_tz=<no_default>, index=None, name=None)
- Create a Series with both index and values equal to the index keys
useful with map for returning an indexer based on an index.
Parameters
----------
keep_tz : optional, defaults True
Return the data keeping the timezone.
If keep_tz is True:
If the timezone is not set, the resulting
Series will have a datetime64[ns] dtype.
Otherwise the Series will have an datetime64[ns, tz] dtype; the
tz will be preserved.
If keep_tz is False:
Series will have a datetime64[ns] dtype. TZ aware
objects will have the tz removed.
.. versionchanged:: 1.0.0
The default value is now True. In a future version,
this keyword will be removed entirely. Stop passing the
argument to obtain the future behavior and silence the warning.
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.
Returns
-------
Series
- tz_convert(self, tz) -> 'DatetimeIndex'
- Convert tz-aware Datetime Array/Index from one time zone to another.
Parameters
----------
tz : str, pytz.timezone, dateutil.tz.tzfile or None
Time zone for time. Corresponding timestamps would be converted
to this time zone of the Datetime Array/Index. A `tz` of None will
convert to UTC and remove the timezone information.
Returns
-------
Array or Index
Raises
------
TypeError
If Datetime Array/Index is tz-naive.
See Also
--------
DatetimeIndex.tz : A timezone that has a variable offset from UTC.
DatetimeIndex.tz_localize : Localize tz-naive DatetimeIndex to a
given time zone, or remove timezone from a tz-aware DatetimeIndex.
Examples
--------
With the `tz` parameter, we can change the DatetimeIndex
to other time zones:
>>> dti = pd.date_range(start='2014-08-01 09:00',
... freq='H', periods=3, tz='Europe/Berlin')
>>> dti
DatetimeIndex(['2014-08-01 09:00:00+02:00',
'2014-08-01 10:00:00+02:00',
'2014-08-01 11:00:00+02:00'],
dtype='datetime64[ns, Europe/Berlin]', freq='H')
>>> dti.tz_convert('US/Central')
DatetimeIndex(['2014-08-01 02:00:00-05:00',
'2014-08-01 03:00:00-05:00',
'2014-08-01 04:00:00-05:00'],
dtype='datetime64[ns, US/Central]', freq='H')
With the ``tz=None``, we can remove the timezone (after converting
to UTC if necessary):
>>> dti = pd.date_range(start='2014-08-01 09:00', freq='H',
... periods=3, tz='Europe/Berlin')
>>> dti
DatetimeIndex(['2014-08-01 09:00:00+02:00',
'2014-08-01 10:00:00+02:00',
'2014-08-01 11:00:00+02:00'],
dtype='datetime64[ns, Europe/Berlin]', freq='H')
>>> dti.tz_convert(None)
DatetimeIndex(['2014-08-01 07:00:00',
'2014-08-01 08:00:00',
'2014-08-01 09:00:00'],
dtype='datetime64[ns]', freq='H')
- tz_localize(self, tz, ambiguous='raise', nonexistent='raise') -> 'DatetimeIndex'
- Localize tz-naive Datetime Array/Index to tz-aware
Datetime Array/Index.
This method takes a time zone (tz) naive Datetime Array/Index object
and makes this time zone aware. It does not move the time to another
time zone.
This method can also be used to do the inverse -- to create a time
zone unaware object from an aware object. To that end, pass `tz=None`.
Parameters
----------
tz : str, pytz.timezone, dateutil.tz.tzfile or None
Time zone to convert timestamps to. Passing ``None`` will
remove the time zone information preserving local time.
ambiguous : 'infer', 'NaT', bool array, default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
`ambiguous` parameter dictates how ambiguous times should be
handled.
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False signifies a
non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
Same type as self
Array/Index converted to the specified time zone.
Raises
------
TypeError
If the Datetime Array/Index is tz-aware and tz is not None.
See Also
--------
DatetimeIndex.tz_convert : Convert tz-aware DatetimeIndex from
one time zone to another.
Examples
--------
>>> tz_naive = pd.date_range('2018-03-01 09:00', periods=3)
>>> tz_naive
DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',
'2018-03-03 09:00:00'],
dtype='datetime64[ns]', freq='D')
Localize DatetimeIndex in US/Eastern time zone:
>>> tz_aware = tz_naive.tz_localize(tz='US/Eastern')
>>> tz_aware
DatetimeIndex(['2018-03-01 09:00:00-05:00',
'2018-03-02 09:00:00-05:00',
'2018-03-03 09:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq=None)
With the ``tz=None``, we can remove the time zone information
while keeping the local time (not converted to UTC):
>>> tz_aware.tz_localize(None)
DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',
'2018-03-03 09:00:00'],
dtype='datetime64[ns]', freq=None)
Be careful with DST changes. When there is sequential data, pandas can
infer the DST time:
>>> s = pd.to_datetime(pd.Series(['2018-10-28 01:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 03:00:00',
... '2018-10-28 03:30:00']))
>>> s.dt.tz_localize('CET', ambiguous='infer')
0 2018-10-28 01:30:00+02:00
1 2018-10-28 02:00:00+02:00
2 2018-10-28 02:30:00+02:00
3 2018-10-28 02:00:00+01:00
4 2018-10-28 02:30:00+01:00
5 2018-10-28 03:00:00+01:00
6 2018-10-28 03:30:00+01:00
dtype: datetime64[ns, CET]
In some cases, inferring the DST is impossible. In such cases, you can
pass an ndarray to the ambiguous parameter to set the DST explicitly
>>> s = pd.to_datetime(pd.Series(['2018-10-28 01:20:00',
... '2018-10-28 02:36:00',
... '2018-10-28 03:46:00']))
>>> s.dt.tz_localize('CET', ambiguous=np.array([True, True, False]))
0 2018-10-28 01:20:00+02:00
1 2018-10-28 02:36:00+02:00
2 2018-10-28 03:46:00+01:00
dtype: datetime64[ns, CET]
If the DST transition causes nonexistent times, you can shift these
dates forward or backwards with a timedelta object or `'shift_forward'`
or `'shift_backwards'`.
>>> s = pd.to_datetime(pd.Series(['2015-03-29 02:30:00',
... '2015-03-29 03:30:00']))
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
0 2015-03-29 03:00:00+02:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
0 2015-03-29 01:59:59.999999999+01:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1H'))
0 2015-03-29 03:30:00+02:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
- union_many(self, others)
- A bit of a hack to accelerate unioning a collection of indexes.
Static methods defined here:
- __new__(cls, data=None, freq=<no_default>, tz=None, normalize: 'bool' = False, closed=None, ambiguous='raise', dayfirst: 'bool' = False, yearfirst: 'bool' = False, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None) -> 'DatetimeIndex'
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- inferred_type
- Return a string of the type inferred from the values.
Data descriptors defined here:
- date
- Returns numpy array of python :class:`datetime.date` objects.
Namely, the date part of Timestamps without time and
timezone information.
- day
- The day of the datetime.
Examples
--------
>>> datetime_series = pd.Series(
... pd.date_range("2000-01-01", periods=3, freq="D")
... )
>>> datetime_series
0 2000-01-01
1 2000-01-02
2 2000-01-03
dtype: datetime64[ns]
>>> datetime_series.dt.day
0 1
1 2
2 3
dtype: int64
- day_of_week
- The day of the week with Monday=0, Sunday=6.
Return the day of the week. It is assumed the week starts on
Monday, which is denoted by 0 and ends on Sunday which is denoted
by 6. This method is available on both Series with datetime
values (using the `dt` accessor) or DatetimeIndex.
Returns
-------
Series or Index
Containing integers indicating the day number.
See Also
--------
Series.dt.dayofweek : Alias.
Series.dt.weekday : Alias.
Series.dt.day_name : Returns the name of the day of the week.
Examples
--------
>>> s = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series()
>>> s.dt.dayofweek
2016-12-31 5
2017-01-01 6
2017-01-02 0
2017-01-03 1
2017-01-04 2
2017-01-05 3
2017-01-06 4
2017-01-07 5
2017-01-08 6
Freq: D, dtype: int64
- day_of_year
- The ordinal day of the year.
- dayofweek
- The day of the week with Monday=0, Sunday=6.
Return the day of the week. It is assumed the week starts on
Monday, which is denoted by 0 and ends on Sunday which is denoted
by 6. This method is available on both Series with datetime
values (using the `dt` accessor) or DatetimeIndex.
Returns
-------
Series or Index
Containing integers indicating the day number.
See Also
--------
Series.dt.dayofweek : Alias.
Series.dt.weekday : Alias.
Series.dt.day_name : Returns the name of the day of the week.
Examples
--------
>>> s = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series()
>>> s.dt.dayofweek
2016-12-31 5
2017-01-01 6
2017-01-02 0
2017-01-03 1
2017-01-04 2
2017-01-05 3
2017-01-06 4
2017-01-07 5
2017-01-08 6
Freq: D, dtype: int64
- dayofyear
- The ordinal day of the year.
- days_in_month
- The number of days in the month.
- daysinmonth
- The number of days in the month.
- dtype
- The dtype for the DatetimeArray.
.. warning::
A future version of pandas will change dtype to never be a
``numpy.dtype``. Instead, :attr:`DatetimeArray.dtype` will
always be an instance of an ``ExtensionDtype`` subclass.
Returns
-------
numpy.dtype or DatetimeTZDtype
If the values are tz-naive, then ``np.dtype('datetime64[ns]')``
is returned.
If the values are tz-aware, then the ``DatetimeTZDtype``
is returned.
- hour
- The hours of the datetime.
Examples
--------
>>> datetime_series = pd.Series(
... pd.date_range("2000-01-01", periods=3, freq="h")
... )
>>> datetime_series
0 2000-01-01 00:00:00
1 2000-01-01 01:00:00
2 2000-01-01 02:00:00
dtype: datetime64[ns]
>>> datetime_series.dt.hour
0 0
1 1
2 2
dtype: int64
- is_leap_year
- Boolean indicator if the date belongs to a leap year.
A leap year is a year, which has 366 days (instead of 365) including
29th of February as an intercalary day.
Leap years are years which are multiples of four with the exception
of years divisible by 100 but not by 400.
Returns
-------
Series or ndarray
Booleans indicating if dates belong to a leap year.
Examples
--------
This method is available on Series with datetime values under
the ``.dt`` accessor, and directly on DatetimeIndex.
>>> idx = pd.date_range("2012-01-01", "2015-01-01", freq="Y")
>>> idx
DatetimeIndex(['2012-12-31', '2013-12-31', '2014-12-31'],
dtype='datetime64[ns]', freq='A-DEC')
>>> idx.is_leap_year
array([ True, False, False])
>>> dates_series = pd.Series(idx)
>>> dates_series
0 2012-12-31
1 2013-12-31
2 2014-12-31
dtype: datetime64[ns]
>>> dates_series.dt.is_leap_year
0 True
1 False
2 False
dtype: bool
- is_month_end
- Indicates whether the date is the last day of the month.
Returns
-------
Series or array
For Series, returns a Series with boolean values.
For DatetimeIndex, returns a boolean array.
See Also
--------
is_month_start : Return a boolean indicating whether the date
is the first day of the month.
is_month_end : Return a boolean indicating whether the date
is the last day of the month.
Examples
--------
This method is available on Series with datetime values under
the ``.dt`` accessor, and directly on DatetimeIndex.
>>> s = pd.Series(pd.date_range("2018-02-27", periods=3))
>>> s
0 2018-02-27
1 2018-02-28
2 2018-03-01
dtype: datetime64[ns]
>>> s.dt.is_month_start
0 False
1 False
2 True
dtype: bool
>>> s.dt.is_month_end
0 False
1 True
2 False
dtype: bool
>>> idx = pd.date_range("2018-02-27", periods=3)
>>> idx.is_month_start
array([False, False, True])
>>> idx.is_month_end
array([False, True, False])
- is_month_start
- Indicates whether the date is the first day of the month.
Returns
-------
Series or array
For Series, returns a Series with boolean values.
For DatetimeIndex, returns a boolean array.
See Also
--------
is_month_start : Return a boolean indicating whether the date
is the first day of the month.
is_month_end : Return a boolean indicating whether the date
is the last day of the month.
Examples
--------
This method is available on Series with datetime values under
the ``.dt`` accessor, and directly on DatetimeIndex.
>>> s = pd.Series(pd.date_range("2018-02-27", periods=3))
>>> s
0 2018-02-27
1 2018-02-28
2 2018-03-01
dtype: datetime64[ns]
>>> s.dt.is_month_start
0 False
1 False
2 True
dtype: bool
>>> s.dt.is_month_end
0 False
1 True
2 False
dtype: bool
>>> idx = pd.date_range("2018-02-27", periods=3)
>>> idx.is_month_start
array([False, False, True])
>>> idx.is_month_end
array([False, True, False])
- is_normalized
- Returns True if all of the dates are at midnight ("no time")
- is_quarter_end
- Indicator for whether the date is the last day of a quarter.
Returns
-------
is_quarter_end : Series or DatetimeIndex
The same type as the original data with boolean values. Series will
have the same name and index. DatetimeIndex will have the same
name.
See Also
--------
quarter : Return the quarter of the date.
is_quarter_start : Similar property indicating the quarter start.
Examples
--------
This method is available on Series with datetime values under
the ``.dt`` accessor, and directly on DatetimeIndex.
>>> df = pd.DataFrame({'dates': pd.date_range("2017-03-30",
... periods=4)})
>>> df.assign(quarter=df.dates.dt.quarter,
... is_quarter_end=df.dates.dt.is_quarter_end)
dates quarter is_quarter_end
0 2017-03-30 1 False
1 2017-03-31 1 True
2 2017-04-01 2 False
3 2017-04-02 2 False
>>> idx = pd.date_range('2017-03-30', periods=4)
>>> idx
DatetimeIndex(['2017-03-30', '2017-03-31', '2017-04-01', '2017-04-02'],
dtype='datetime64[ns]', freq='D')
>>> idx.is_quarter_end
array([False, True, False, False])
- is_quarter_start
- Indicator for whether the date is the first day of a quarter.
Returns
-------
is_quarter_start : Series or DatetimeIndex
The same type as the original data with boolean values. Series will
have the same name and index. DatetimeIndex will have the same
name.
See Also
--------
quarter : Return the quarter of the date.
is_quarter_end : Similar property for indicating the quarter start.
Examples
--------
This method is available on Series with datetime values under
the ``.dt`` accessor, and directly on DatetimeIndex.
>>> df = pd.DataFrame({'dates': pd.date_range("2017-03-30",
... periods=4)})
>>> df.assign(quarter=df.dates.dt.quarter,
... is_quarter_start=df.dates.dt.is_quarter_start)
dates quarter is_quarter_start
0 2017-03-30 1 False
1 2017-03-31 1 False
2 2017-04-01 2 True
3 2017-04-02 2 False
>>> idx = pd.date_range('2017-03-30', periods=4)
>>> idx
DatetimeIndex(['2017-03-30', '2017-03-31', '2017-04-01', '2017-04-02'],
dtype='datetime64[ns]', freq='D')
>>> idx.is_quarter_start
array([False, False, True, False])
- is_year_end
- Indicate whether the date is the last day of the year.
Returns
-------
Series or DatetimeIndex
The same type as the original data with boolean values. Series will
have the same name and index. DatetimeIndex will have the same
name.
See Also
--------
is_year_start : Similar property indicating the start of the year.
Examples
--------
This method is available on Series with datetime values under
the ``.dt`` accessor, and directly on DatetimeIndex.
>>> dates = pd.Series(pd.date_range("2017-12-30", periods=3))
>>> dates
0 2017-12-30
1 2017-12-31
2 2018-01-01
dtype: datetime64[ns]
>>> dates.dt.is_year_end
0 False
1 True
2 False
dtype: bool
>>> idx = pd.date_range("2017-12-30", periods=3)
>>> idx
DatetimeIndex(['2017-12-30', '2017-12-31', '2018-01-01'],
dtype='datetime64[ns]', freq='D')
>>> idx.is_year_end
array([False, True, False])
- is_year_start
- Indicate whether the date is the first day of a year.
Returns
-------
Series or DatetimeIndex
The same type as the original data with boolean values. Series will
have the same name and index. DatetimeIndex will have the same
name.
See Also
--------
is_year_end : Similar property indicating the last day of the year.
Examples
--------
This method is available on Series with datetime values under
the ``.dt`` accessor, and directly on DatetimeIndex.
>>> dates = pd.Series(pd.date_range("2017-12-30", periods=3))
>>> dates
0 2017-12-30
1 2017-12-31
2 2018-01-01
dtype: datetime64[ns]
>>> dates.dt.is_year_start
0 False
1 False
2 True
dtype: bool
>>> idx = pd.date_range("2017-12-30", periods=3)
>>> idx
DatetimeIndex(['2017-12-30', '2017-12-31', '2018-01-01'],
dtype='datetime64[ns]', freq='D')
>>> idx.is_year_start
array([False, False, True])
- microsecond
- The microseconds of the datetime.
Examples
--------
>>> datetime_series = pd.Series(
... pd.date_range("2000-01-01", periods=3, freq="us")
... )
>>> datetime_series
0 2000-01-01 00:00:00.000000
1 2000-01-01 00:00:00.000001
2 2000-01-01 00:00:00.000002
dtype: datetime64[ns]
>>> datetime_series.dt.microsecond
0 0
1 1
2 2
dtype: int64
- minute
- The minutes of the datetime.
Examples
--------
>>> datetime_series = pd.Series(
... pd.date_range("2000-01-01", periods=3, freq="T")
... )
>>> datetime_series
0 2000-01-01 00:00:00
1 2000-01-01 00:01:00
2 2000-01-01 00:02:00
dtype: datetime64[ns]
>>> datetime_series.dt.minute
0 0
1 1
2 2
dtype: int64
- month
- The month as January=1, December=12.
Examples
--------
>>> datetime_series = pd.Series(
... pd.date_range("2000-01-01", periods=3, freq="M")
... )
>>> datetime_series
0 2000-01-31
1 2000-02-29
2 2000-03-31
dtype: datetime64[ns]
>>> datetime_series.dt.month
0 1
1 2
2 3
dtype: int64
- nanosecond
- The nanoseconds of the datetime.
Examples
--------
>>> datetime_series = pd.Series(
... pd.date_range("2000-01-01", periods=3, freq="ns")
... )
>>> datetime_series
0 2000-01-01 00:00:00.000000000
1 2000-01-01 00:00:00.000000001
2 2000-01-01 00:00:00.000000002
dtype: datetime64[ns]
>>> datetime_series.dt.nanosecond
0 0
1 1
2 2
dtype: int64
- quarter
- The quarter of the date.
- second
- The seconds of the datetime.
Examples
--------
>>> datetime_series = pd.Series(
... pd.date_range("2000-01-01", periods=3, freq="s")
... )
>>> datetime_series
0 2000-01-01 00:00:00
1 2000-01-01 00:00:01
2 2000-01-01 00:00:02
dtype: datetime64[ns]
>>> datetime_series.dt.second
0 0
1 1
2 2
dtype: int64
- time
- Returns numpy array of :class:`datetime.time` objects.
The time part of the Timestamps.
- timetz
- Returns numpy array of :class:`datetime.time` objects with timezone
information.
The time part of the Timestamps.
- tz
- Return the timezone.
Returns
-------
datetime.tzinfo, pytz.tzinfo.BaseTZInfo, dateutil.tz.tz.tzfile, or None
Returns None when the array is tz-naive.
- tzinfo
- Alias for tz attribute
- week
- The week ordinal of the year.
.. deprecated:: 1.1.0
weekofyear and week have been deprecated.
Please use DatetimeIndex.isocalendar().week instead.
- weekday
- The day of the week with Monday=0, Sunday=6.
Return the day of the week. It is assumed the week starts on
Monday, which is denoted by 0 and ends on Sunday which is denoted
by 6. This method is available on both Series with datetime
values (using the `dt` accessor) or DatetimeIndex.
Returns
-------
Series or Index
Containing integers indicating the day number.
See Also
--------
Series.dt.dayofweek : Alias.
Series.dt.weekday : Alias.
Series.dt.day_name : Returns the name of the day of the week.
Examples
--------
>>> s = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series()
>>> s.dt.dayofweek
2016-12-31 5
2017-01-01 6
2017-01-02 0
2017-01-03 1
2017-01-04 2
2017-01-05 3
2017-01-06 4
2017-01-07 5
2017-01-08 6
Freq: D, dtype: int64
- weekofyear
- The week ordinal of the year.
.. deprecated:: 1.1.0
weekofyear and week have been deprecated.
Please use DatetimeIndex.isocalendar().week instead.
- year
- The year of the datetime.
Examples
--------
>>> datetime_series = pd.Series(
... pd.date_range("2000-01-01", periods=3, freq="Y")
... )
>>> datetime_series
0 2000-12-31
1 2001-12-31
2 2002-12-31
dtype: datetime64[ns]
>>> datetime_series.dt.year
0 2000
1 2001
2 2002
dtype: int64
Data and other attributes defined here:
- __annotations__ = {'_data': 'DatetimeArray', 'inferred_freq': 'str | None', 'tz': 'tzinfo | None'}
Methods inherited from pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin:
- delete(self, loc)
- Make new Index with passed location(-s) deleted.
Parameters
----------
loc : int or list of int
Location of item(-s) which will be deleted.
Use a list of locations to delete more than one value at the same time.
Returns
-------
Index
Will be same type as self, except for RangeIndex.
See Also
--------
numpy.delete : Delete any rows and column from NumPy array (ndarray).
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete(1)
Index(['a', 'c'], dtype='object')
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete([0, 2])
Index(['b'], dtype='object')
- insert(self, loc: 'int', item)
- Make new Index inserting new item at location.
Follows Python numpy.insert semantics for negative values.
Parameters
----------
loc : int
item : object
Returns
-------
new_index : Index
- is_type_compatible(self, kind: 'str') -> 'bool'
- Whether the index type is compatible with the provided type.
- take(self, indices, axis=0, allow_fill=True, fill_value=None, **kwargs)
- Return a new Index of the values selected by the indices.
For internal compatibility with numpy arrays.
Parameters
----------
indices : array-like
Indices to be taken.
axis : int, optional
The axis over which to select values, always 0.
allow_fill : bool, default True
fill_value : scalar, default None
If allow_fill=True and fill_value is not None, indices specified by
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
Returns
-------
Index
An index formed of elements at the given indices. Will be the same
type as self, except for RangeIndex.
See Also
--------
numpy.ndarray.take: Return an array formed from the
elements of a at the given indices.
Readonly properties inherited from pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin:
- values
- Return an array representing the data in the Index.
.. warning::
We recommend using :attr:`Index.array` or
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
array: numpy.ndarray or ExtensionArray
See Also
--------
Index.array : Reference to the underlying data.
Index.to_numpy : A NumPy array representing the underlying data.
Methods inherited from pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin:
- __contains__(self, key: 'Any') -> 'bool'
- Return a boolean indicating whether the provided key is in the index.
Parameters
----------
key : label
The key to check if it is present in the index.
Returns
-------
bool
Whether the key search is in the index.
Raises
------
TypeError
If the key is not hashable.
See Also
--------
Index.isin : Returns an ndarray of boolean dtype indicating whether the
list-like key is in the index.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> 2 in idx
True
>>> 6 in idx
False
- equals(self, other: 'Any') -> 'bool'
- Determines if two Index objects contain the same elements.
- format(self, name: 'bool' = False, formatter: 'Callable | None' = None, na_rep: 'str' = 'NaT', date_format: 'str | None' = None) -> 'list[str]'
- Render a string representation of the Index.
- mean(self, *args, **kwargs)
- Return the mean value of the Array.
.. versionadded:: 0.25.0
Parameters
----------
skipna : bool, default True
Whether to ignore any NaT elements.
axis : int, optional, default 0
Returns
-------
scalar
Timestamp or Timedelta.
See Also
--------
numpy.ndarray.mean : Returns the average of array elements along a given axis.
Series.mean : Return the mean value in a Series.
Notes
-----
mean is only defined for Datetime and Timedelta dtypes, not for Period.
- shift(self: '_T', periods: 'int' = 1, freq=None) -> '_T'
- Shift index by desired number of time frequency increments.
This method is for shifting the values of datetime-like indexes
by a specified time increment a given number of times.
Parameters
----------
periods : int, default 1
Number of periods (or increments) to shift by,
can be positive or negative.
freq : pandas.DateOffset, pandas.Timedelta or string, optional
Frequency increment to shift by.
If None, the index is shifted by its own `freq` attribute.
Offset aliases are valid strings, e.g., 'D', 'W', 'M' etc.
Returns
-------
pandas.DatetimeIndex
Shifted index.
See Also
--------
Index.shift : Shift values of Index.
PeriodIndex.shift : Shift values of PeriodIndex.
Data descriptors inherited from pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin:
- asi8
- Integer representation of the values.
Returns
-------
ndarray
An ndarray with int64 dtype.
- freq
- Return the frequency object if it is set, otherwise None.
- freqstr
- Return the frequency object as a string if its set, otherwise None.
- hasnans
- return if I have any nans; enables various perf speedups
- inferred_freq
- Tries to return a string representing a frequency guess,
generated by infer_freq. Returns None if it can't autodetect the
frequency.
- resolution
- Returns day, hour, minute, second, millisecond or microsecond
Methods inherited from pandas.core.indexes.extension.ExtensionIndex:
- map(self, mapper, na_action=None)
- Map values using an input mapping or function.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence.
Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.
Methods inherited from pandas.core.indexes.base.Index:
- __abs__(self)
- __and__(self, other)
- __array__(self, dtype=None) -> 'np.ndarray'
- The array interface, return my values.
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str_t', *inputs, **kwargs)
- __array_wrap__(self, result, context=None)
- Gets called after a ufunc and other functions e.g. np.split.
- __bool__ = __nonzero__(self)
- __copy__(self: '_IndexT', **kwargs) -> '_IndexT'
- __deepcopy__(self: '_IndexT', memo=None) -> '_IndexT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __getitem__(self, key)
- Override numpy.ndarray's __getitem__ method to work as desired.
This function adds lists and Series as valid boolean indexers
(ndarrays only supports ndarray with dtype=bool).
If resulting ndim != 1, plain ndarray is returned instead of
corresponding `Index` subclass.
- __iadd__(self, other)
- __invert__(self)
- __len__(self) -> 'int'
- Return the length of the Index.
- __neg__(self)
- __nonzero__(self)
- __or__(self, other)
- __pos__(self)
- __repr__(self) -> 'str_t'
- Return a string representation for this object.
- __setitem__(self, key, value)
- __xor__(self, other)
- all(self, *args, **kwargs)
- Return whether all elements are Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
all : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.any : Return whether any element in an Index is True.
Series.any : Return whether any element in a Series is True.
Series.all : Return whether all elements in a Series are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because ``0`` is considered False.
>>> pd.Index([0, 1, 2]).all()
False
- any(self, *args, **kwargs)
- Return whether any element is Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
any : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.all : Return whether all elements are True.
Series.all : Return whether all elements are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
>>> index = pd.Index([0, 1, 2])
>>> index.any()
True
>>> index = pd.Index([0, 0, 0])
>>> index.any()
False
- append(self, other: 'Index | Sequence[Index]') -> 'Index'
- Append a collection of Index options together.
Parameters
----------
other : Index or list/tuple of indices
Returns
-------
Index
- argmax(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argsort(self, *args, **kwargs) -> 'npt.NDArray[np.intp]'
- Return the integer indices that would sort the index.
Parameters
----------
*args
Passed to `numpy.ndarray.argsort`.
**kwargs
Passed to `numpy.ndarray.argsort`.
Returns
-------
np.ndarray[np.intp]
Integer indices that would sort the index if used as
an indexer.
See Also
--------
numpy.argsort : Similar method for NumPy arrays.
Index.sort_values : Return sorted copy of Index.
Examples
--------
>>> idx = pd.Index(['b', 'a', 'd', 'c'])
>>> idx
Index(['b', 'a', 'd', 'c'], dtype='object')
>>> order = idx.argsort()
>>> order
array([1, 0, 3, 2])
>>> idx[order]
Index(['a', 'b', 'c', 'd'], dtype='object')
- asof(self, label)
- Return the label from the index, or, if not present, the previous one.
Assuming that the index is sorted, return the passed index label if it
is in the index, or return the previous index label if the passed one
is not in the index.
Parameters
----------
label : object
The label up to which the method returns the latest index label.
Returns
-------
object
The passed label if it is in the index. The previous label if the
passed label is not in the sorted index or `NaN` if there is no
such label.
See Also
--------
Series.asof : Return the latest value in a Series up to the
passed index.
merge_asof : Perform an asof merge (similar to left join but it
matches on nearest key rather than equal key).
Index.get_loc : An `asof` is a thin wrapper around `get_loc`
with method='pad'.
Examples
--------
`Index.asof` returns the latest index label up to the passed label.
>>> idx = pd.Index(['2013-12-31', '2014-01-02', '2014-01-03'])
>>> idx.asof('2014-01-01')
'2013-12-31'
If the label is in the index, the method returns the passed label.
>>> idx.asof('2014-01-02')
'2014-01-02'
If all of the labels in the index are later than the passed label,
NaN is returned.
>>> idx.asof('1999-01-02')
nan
If the index is not sorted, an error is raised.
>>> idx_not_sorted = pd.Index(['2013-12-31', '2015-01-02',
... '2014-01-03'])
>>> idx_not_sorted.asof('2013-12-31')
Traceback (most recent call last):
ValueError: index must be monotonic increasing or decreasing
- asof_locs(self, where: 'Index', mask: 'np.ndarray') -> 'npt.NDArray[np.intp]'
- Return the locations (indices) of labels in the index.
As in the `asof` function, if the label (a particular entry in
`where`) is not in the index, the latest index label up to the
passed label is chosen and its index returned.
If all of the labels in the index are later than a label in `where`,
-1 is returned.
`mask` is used to ignore NA values in the index during calculation.
Parameters
----------
where : Index
An Index consisting of an array of timestamps.
mask : np.ndarray[bool]
Array of booleans denoting where values in the original
data are not NA.
Returns
-------
np.ndarray[np.intp]
An array of locations (indices) of the labels from the Index
which correspond to the return values of the `asof` function
for every element in `where`.
- astype(self, dtype, copy: 'bool' = True)
- Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a TypeError exception is raised.
Parameters
----------
dtype : numpy dtype or pandas type
Note that any signed integer `dtype` is treated as ``'int64'``,
and any unsigned integer `dtype` is treated as ``'uint64'``,
regardless of the size.
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
Returns
-------
Index
Index with values cast to specified dtype.
- copy(self: '_IndexT', name: 'Hashable | None' = None, deep: 'bool' = False, dtype: 'Dtype | None' = None, names: 'Sequence[Hashable] | None' = None) -> '_IndexT'
- Make a copy of this object.
Name and dtype sets those attributes on the new object.
Parameters
----------
name : Label, optional
Set name for new object.
deep : bool, default False
dtype : numpy dtype or pandas type, optional
Set dtype for new object.
.. deprecated:: 1.2.0
use ``astype`` method instead.
names : list-like, optional
Kept for compatibility with MultiIndex. Should not be used.
.. deprecated:: 1.4.0
use ``name`` instead.
Returns
-------
Index
Index refer to new object which is a copy of this object.
Notes
-----
In most cases, there should be no functional difference from using
``deep``, but if ``deep`` is passed it will attempt to deepcopy.
- difference(self, other, sort=None)
- Return a new Index with elements of index not in `other`.
This is the set difference of two Index objects.
Parameters
----------
other : Index or array-like
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
difference : Index
Examples
--------
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
- drop(self, labels, errors: 'str_t' = 'raise') -> 'Index'
- Make new Index with passed list of labels deleted.
Parameters
----------
labels : array-like or scalar
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and existing labels are dropped.
Returns
-------
dropped : Index
Will be same type as self, except for RangeIndex.
Raises
------
KeyError
If not all of the labels are found in the selected axis
- drop_duplicates(self: '_IndexT', keep: 'str_t | bool' = 'first') -> '_IndexT'
- Return Index with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
Returns
-------
deduplicated : Index
See Also
--------
Series.drop_duplicates : Equivalent method on Series.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Index.duplicated : Related method on Index, indicating duplicate
Index values.
Examples
--------
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The `keep` parameter controls which duplicate values are removed.
The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value 'last' keeps the last occurrence for each set of duplicated
entries.
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value ``False`` discards all sets of duplicated entries.
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
- droplevel(self, level=0)
- Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex.
Parameters
----------
level : int, str, or list-like, default 0
If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
Returns
-------
Index or MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.droplevel()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
>>> mi.droplevel(2)
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel('z')
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel(['x', 'y'])
Int64Index([5, 6], dtype='int64', name='z')
- dropna(self: '_IndexT', how: 'str_t' = 'any') -> '_IndexT'
- Return Index without NA/NaN values.
Parameters
----------
how : {'any', 'all'}, default 'any'
If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
Returns
-------
Index
- duplicated(self, keep: "Literal[('first', 'last', False)]" = 'first') -> 'npt.NDArray[np.bool_]'
- Indicate duplicate index values.
Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first, or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
The value or values in a set of duplicates to mark as missing.
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
np.ndarray[bool]
See Also
--------
Series.duplicated : Equivalent method on pandas.Series.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Index.drop_duplicates : Remove duplicate values from Index.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set to False and all others to True:
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first')
array([False, False, True, False, True])
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> idx.duplicated(keep='last')
array([ True, False, True, False, False])
By setting keep on ``False``, all duplicates are True:
>>> idx.duplicated(keep=False)
array([ True, False, True, False, True])
- fillna(self, value=None, downcast=None)
- Fill NA/NaN values with the specified value.
Parameters
----------
value : scalar
Scalar value to use to fill holes (e.g. 0).
This value cannot be a list-likes.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
Index
See Also
--------
DataFrame.fillna : Fill NaN values of a DataFrame.
Series.fillna : Fill NaN Values of a Series.
- get_indexer(self, target, method: 'str_t | None' = None, limit: 'int | None' = None, tolerance=None) -> 'npt.NDArray[np.intp]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
Notes
-----
Returns -1 for unmatched values, for further explanation see the
example below.
Examples
--------
>>> index = pd.Index(['c', 'a', 'b'])
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1, 2, -1])
Notice that the return value is an array of locations in ``index``
and ``x`` is marked by -1, as it is not in ``index``.
- get_indexer_for(self, target) -> 'npt.NDArray[np.intp]'
- Guaranteed return of an indexer even when non-unique.
This dispatches to get_indexer or get_indexer_non_unique
as appropriate.
Returns
-------
np.ndarray[np.intp]
List of indices.
Examples
--------
>>> idx = pd.Index([np.nan, 'var1', np.nan])
>>> idx.get_indexer_for([np.nan])
array([0, 2])
- get_indexer_non_unique(self, target) -> 'tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
missing : np.ndarray[np.intp]
An indexer into the target of the values not found.
These correspond to the -1 in the indexer array.
- get_level_values = _get_level_values(self, level) -> 'Index'
- get_value(self, series: 'Series', key)
- Fast lookup of value from 1-dimensional ndarray.
Only use this if you know what you're doing.
Returns
-------
scalar or Series
- groupby(self, values) -> 'PrettyDict[Hashable, np.ndarray]'
- Group the index labels by a given array of values.
Parameters
----------
values : array
Values used to determine the groups.
Returns
-------
dict
{group name -> group labels}
- holds_integer(self) -> 'bool'
- Whether the type is an integer type.
- identical(self, other) -> 'bool'
- Similar to equals, but checks that object attributes and types are also equal.
Returns
-------
bool
If two Index objects have equal elements and same type True,
otherwise False.
- intersection(self, other, sort=False)
- Form the intersection of two Index objects.
This returns a new Index with elements common to the index and `other`.
Parameters
----------
other : Index or array-like
sort : False or None, default False
Whether to sort the resulting index.
* False : do not sort the result.
* None : sort the result, except when `self` and `other` are equal
or when the values cannot be compared.
Returns
-------
intersection : Index
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
- is_(self, other) -> 'bool'
- More flexible, faster check like ``is`` but that works through views.
Note: this is *not* the same as ``Index.identical()``, which checks
that metadata is also the same.
Parameters
----------
other : object
Other object to compare against.
Returns
-------
bool
True if both have same underlying data, False otherwise.
See Also
--------
Index.identical : Works like ``Index.is_`` but also checks metadata.
- is_boolean(self) -> 'bool'
- Check if the Index only consists of booleans.
Returns
-------
bool
Whether or not the Index only consists of booleans.
See Also
--------
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
- is_categorical(self) -> 'bool'
- Check if the Index holds categorical data.
Returns
-------
bool
True if the Index is categorical.
See Also
--------
CategoricalIndex : Index for categorical data.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = pd.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0 Peter
1 Victor
2 Elisabeth
3 Mar
dtype: object
>>> s.index.is_categorical()
False
- is_floating(self) -> 'bool'
- Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats,
integers, or NaNs.
Returns
-------
bool
Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
- is_integer(self) -> 'bool'
- Check if the Index only consists of integers.
Returns
-------
bool
Whether or not the Index only consists of integers.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
- is_interval(self) -> 'bool'
- Check if the Index holds Interval objects.
Returns
-------
bool
Whether or not the Index holds Interval objects.
See Also
--------
IntervalIndex : Index for Interval objects.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
- is_mixed(self) -> 'bool'
- Check if the Index holds data with mixed data types.
Returns
-------
bool
Whether or not the Index holds data with mixed data types.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
- is_numeric(self) -> 'bool'
- Check if the Index only consists of numeric data.
Returns
-------
bool
Whether or not the Index only consists of numeric data.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
- is_object(self) -> 'bool'
- Check if the Index is of the object dtype.
Returns
-------
bool
Whether or not the Index is of the object dtype.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
- isin(self, values, level=None) -> 'np.ndarray'
- Return a boolean array where the index values are in `values`.
Compute boolean array of whether each index value is found in the
passed set of values. The length of the returned boolean array matches
the length of the index.
Parameters
----------
values : set or list-like
Sought values.
level : str or int, optional
Name or position of the index level to use (if the index is a
`MultiIndex`).
Returns
-------
np.ndarray[bool]
NumPy array of boolean values.
See Also
--------
Series.isin : Same for Series.
DataFrame.isin : Same method for DataFrames.
Notes
-----
In the case of `MultiIndex` you must either specify `values` as a
list-like object containing tuples that are the same length as the
number of levels, or specify `level`. Otherwise it will raise a
``ValueError``.
If `level` is specified:
- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.
Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4])
array([ True, False, False])
>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red', 'blue', 'green']],
... names=('number', 'color'))
>>> midx
MultiIndex([(1, 'red'),
(2, 'blue'),
(3, 'green')],
names=['number', 'color'])
Check whether the strings in the 'color' level of the MultiIndex
are in a list of colors.
>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])
To check across the levels of a MultiIndex, pass a list of tuples:
>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
For a DatetimeIndex, string values in `values` are converted to
Timestamps.
>>> dates = ['2000-03-11', '2000-03-12', '2000-03-13']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],
dtype='datetime64[ns]', freq=None)
>>> dti.isin(['2000-03-11'])
array([ True, False, False])
- isna(self) -> 'npt.NDArray[np.bool_]'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`pd.NaT`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values. Characters such as
empty strings `''` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
numpy.ndarray[bool]
A boolean array of whether my values are NA.
See Also
--------
Index.notna : Boolean inverse of isna.
Index.dropna : Omit entries with missing values.
isna : Top-level isna.
Series.isna : Detect missing values in Series object.
Examples
--------
Show which entries in a pandas.Index are NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.isna()
array([False, False, True])
Empty strings are not considered NA values. None is considered an NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.isna()
array([False, False, False, True])
For datetimes, `NaT` (Not a Time) is considered as an NA value.
>>> idx = pd.DatetimeIndex([pd.Timestamp('1940-04-25'),
... pd.Timestamp(''), None, pd.NaT])
>>> idx
DatetimeIndex(['1940-04-25', 'NaT', 'NaT', 'NaT'],
dtype='datetime64[ns]', freq=None)
>>> idx.isna()
array([False, True, True, True])
- isnull = isna(self) -> 'npt.NDArray[np.bool_]'
- join(self, other, how: 'str_t' = 'left', level=None, return_indexers: 'bool' = False, sort: 'bool' = False)
- Compute join_index and indexers to conform data
structures to the new index.
Parameters
----------
other : Index
how : {'left', 'right', 'inner', 'outer'}
level : int or level name, default None
return_indexers : bool, default False
sort : bool, default False
Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword).
Returns
-------
join_index, (left_indexer, right_indexer)
- max(self, axis=None, skipna=True, *args, **kwargs)
- Return the maximum value of the Index.
Parameters
----------
axis : int, optional
For compatibility with NumPy. Only 0 or None are allowed.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Maximum value.
See Also
--------
Index.min : Return the minimum value in an Index.
Series.max : Return the maximum value in a Series.
DataFrame.max : Return the maximum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.max()
3
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.max()
'c'
For a MultiIndex, the maximum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.max()
('b', 2)
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of the values.
Parameters
----------
deep : bool, default False
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption.
Returns
-------
bytes used
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False or if used on PyPy
- min(self, axis=None, skipna=True, *args, **kwargs)
- Return the minimum value of the Index.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Minimum value.
See Also
--------
Index.max : Return the maximum value of the object.
Series.min : Return the minimum value in a Series.
DataFrame.min : Return the minimum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.min()
1
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.min()
'a'
For a MultiIndex, the minimum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.min()
('a', 1)
- notna(self) -> 'npt.NDArray[np.bool_]'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.
Returns
-------
numpy.ndarray[bool]
Boolean array to indicate which entries are not NA.
See Also
--------
Index.notnull : Alias of notna.
Index.isna: Inverse of notna.
notna : Top-level notna.
Examples
--------
Show which entries in an Index are not NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.notna()
array([ True, True, False])
Empty strings are not considered NA values. None is considered a NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.notna()
array([ True, True, True, False])
- notnull = notna(self) -> 'npt.NDArray[np.bool_]'
- putmask(self, mask, value) -> 'Index'
- Return a new Index of the values set with the mask.
Returns
-------
Index
See Also
--------
numpy.ndarray.putmask : Changes elements of an array
based on conditional and input values.
- ravel(self, order='C')
- Return an ndarray of the flattened values of the underlying data.
Returns
-------
numpy.ndarray
Flattened array.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> 'tuple[Index, npt.NDArray[np.intp] | None]'
- Create index with target's values.
Parameters
----------
target : an iterable
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
level : int, optional
Level of multiindex.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : int or float, optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
new_index : pd.Index
Resulting index.
indexer : np.ndarray[np.intp] or None
Indices of output values in original index.
Raises
------
TypeError
If ``method`` passed along with ``level``.
ValueError
If non-unique multi-index
ValueError
If non-unique index and ``method`` or ``limit`` passed.
See Also
--------
Series.reindex
DataFrame.reindex
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.reindex(['car', 'bike'])
(Index(['car', 'bike'], dtype='object'), array([0, 1]))
- rename(self, name, inplace=False)
- Alter Index or MultiIndex name.
Able to set new names without level. Defaults to returning new index.
Length of names must match number of levels in MultiIndex.
Parameters
----------
name : label or list of labels
Name(s) to set.
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.set_names : Able to set new names partially and by level.
Examples
--------
>>> idx = pd.Index(['A', 'C', 'A', 'B'], name='score')
>>> idx.rename('grade')
Index(['A', 'C', 'A', 'B'], dtype='object', name='grade')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]],
... names=['kind', 'year'])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.rename(['species', 'year'])
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
>>> idx.rename('species')
Traceback (most recent call last):
TypeError: Must pass list-like as `names`.
- repeat(self, repeats, axis=None)
- Repeat elements of a Index.
Returns a new Index where each element of the current Index
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Index.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
repeated_index : Index
Newly created Index with repeated elements.
See Also
--------
Series.repeat : Equivalent function for Series.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2)
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3])
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
- set_names(self, names, level=None, inplace: 'bool' = False)
- Set Index or MultiIndex name.
Able to set new names partially and by level.
Parameters
----------
names : label or list of label or dict-like for MultiIndex
Name(s) to set.
.. versionchanged:: 1.3.0
level : int, label or list of int or label, optional
If the index is a MultiIndex and names is not dict-like, level(s) to set
(None for all levels). Otherwise level must be None.
.. versionchanged:: 1.3.0
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.rename : Able to set new names without level.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
)
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.set_names('species', level=0)
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
When renaming levels with a dict, levels can not be passed.
>>> idx.set_names({'kind': 'snake'})
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['snake', 'year'])
- set_value(self, arr, key, value)
- Fast lookup of value from 1-dimensional ndarray.
.. deprecated:: 1.0
Notes
-----
Only use this if you know what you're doing.
- slice_locs(self, start=None, end=None, step=None, kind=<no_default>) -> 'tuple[int, int]'
- Compute slice locations for input labels.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, defaults None
If None, defaults to 1.
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
start, end : int
See Also
--------
Index.get_loc : Get location for a single label.
Notes
-----
This method only works if the index is monotonic or unique.
Examples
--------
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_locs(start='b', end='c')
(1, 3)
- sort(self, *args, **kwargs)
- Use sort_values instead.
- sort_values(self, return_indexer: 'bool' = False, ascending: 'bool' = True, na_position: 'str_t' = 'last', key: 'Callable | None' = None)
- Return a sorted copy of the index.
Return a sorted copy of the index, and optionally return the indices
that sorted the index itself.
Parameters
----------
return_indexer : bool, default False
Should the indices that would sort the index be returned.
ascending : bool, default True
Should the index values be sorted in an ascending order.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
.. versionadded:: 1.2.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
sorted_index : pandas.Index
Sorted copy of the index.
indexer : numpy.ndarray, optional
The indices that the index itself was sorted by.
See Also
--------
Series.sort_values : Sort values of a Series.
DataFrame.sort_values : Sort values in a DataFrame.
Examples
--------
>>> idx = pd.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices `idx` was
sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
- sortlevel(self, level=None, ascending=True, sort_remaining=None)
- For internal compatibility with the Index API.
Sort the Index. This is for compat with MultiIndex
Parameters
----------
ascending : bool, default True
False to sort in descending order
level, sort_remaining are compat parameters
Returns
-------
Index
- symmetric_difference(self, other, result_name=None, sort=None)
- Compute the symmetric difference of two Index objects.
Parameters
----------
other : Index or array-like
result_name : str
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
symmetric_difference : Index
Notes
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([2, 3, 4, 5])
>>> idx1.symmetric_difference(idx2)
Int64Index([1, 5], dtype='int64')
- to_flat_index(self)
- Identity method.
This is implemented for compatibility with subclass implementations
when chaining.
Returns
-------
pd.Index
Caller.
See Also
--------
MultiIndex.to_flat_index : Subclass implementation.
- to_frame(self, index: 'bool' = True, name: 'Hashable' = <no_default>) -> 'DataFrame'
- Create a DataFrame with a column containing the Index.
Parameters
----------
index : bool, default True
Set the index of the returned DataFrame as the original Index.
name : object, default None
The passed name should substitute for the index name (if it has
one).
Returns
-------
DataFrame
DataFrame containing the original Index data.
See Also
--------
Index.to_series : Convert an Index to a Series.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
animal
animal
Ant Ant
Bear Bear
Cow Cow
By default, the original Index is reused. To enforce a new Index:
>>> idx.to_frame(index=False)
animal
0 Ant
1 Bear
2 Cow
To override the name of the resulting column, specify `name`:
>>> idx.to_frame(index=False, name='zoo')
zoo
0 Ant
1 Bear
2 Cow
- to_native_types(self, slicer=None, **kwargs) -> 'np.ndarray'
- Format specified values of `self` and return them.
.. deprecated:: 1.2.0
Parameters
----------
slicer : int, array-like
An indexer into `self` that specifies which values
are used in the formatting process.
kwargs : dict
Options for specifying how the values should be formatted.
These options include the following:
1) na_rep : str
The value that serves as a placeholder for NULL values
2) quoting : bool or None
Whether or not there are quoted values in `self`
3) date_format : str
The format used to represent date-like values.
Returns
-------
numpy.ndarray
Formatted values.
- union(self, other, sort=None)
- Form the union of two Index objects.
If the Index objects are incompatible, both Index objects will be
cast to dtype('object') first.
.. versionchanged:: 0.25.0
Parameters
----------
other : Index or array-like
sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.
* False : do not sort the result.
Returns
-------
union : Index
Examples
--------
Union matching dtypes
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union mismatched dtypes
>>> idx1 = pd.Index(['a', 'b', 'c', 'd'])
>>> idx2 = pd.Index([1, 2, 3, 4])
>>> idx1.union(idx2)
Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
MultiIndex case
>>> idx1 = pd.MultiIndex.from_arrays(
... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
... )
>>> idx1
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue')],
)
>>> idx2 = pd.MultiIndex.from_arrays(
... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
... )
>>> idx2
MultiIndex([(3, 'Red'),
(3, 'Green'),
(2, 'Red'),
(2, 'Green')],
)
>>> idx1.union(idx2)
MultiIndex([(1, 'Blue'),
(1, 'Red'),
(2, 'Blue'),
(2, 'Green'),
(2, 'Red'),
(3, 'Green'),
(3, 'Red')],
)
>>> idx1.union(idx2, sort=False)
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue'),
(3, 'Red'),
(3, 'Green'),
(2, 'Green')],
)
- unique(self: '_IndexT', level: 'Hashable | None' = None) -> '_IndexT'
- Return unique values in the index.
Unique values are returned in order of appearance, this does NOT sort.
Parameters
----------
level : int or hashable, optional
Only return values from specified level (for MultiIndex).
If int, gets the level by integer position, else by level name.
Returns
-------
Index
See Also
--------
unique : Numpy array of unique values in that column.
Series.unique : Return unique values of Series object.
- view(self, cls=None)
- where(self, cond, other=None) -> 'Index'
- Replace values where the condition is False.
The replacement is taken from other.
Parameters
----------
cond : bool array-like with the same length as self
Condition to select the values on.
other : scalar, or array-like, default None
Replacement if the condition is False.
Returns
-------
pandas.Index
A copy of self with values replaced from other
where the condition is False.
See Also
--------
Series.where : Same method for Series.
DataFrame.where : Same method for DataFrame.
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.where(idx.isin(['car', 'train']), 'other')
Index(['car', 'other', 'train', 'other'], dtype='object')
Readonly properties inherited from pandas.core.indexes.base.Index:
- has_duplicates
- Check if the Index has duplicate values.
Returns
-------
bool
Whether or not the Index has duplicate values.
Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
- is_monotonic
- Alias for is_monotonic_increasing.
- is_monotonic_decreasing
- Return if the index is monotonic decreasing (only equal or
decreasing) values.
Examples
--------
>>> Index([3, 2, 1]).is_monotonic_decreasing
True
>>> Index([3, 2, 2]).is_monotonic_decreasing
True
>>> Index([3, 1, 2]).is_monotonic_decreasing
False
- is_monotonic_increasing
- Return if the index is monotonic increasing (only equal or
increasing) values.
Examples
--------
>>> Index([1, 2, 3]).is_monotonic_increasing
True
>>> Index([1, 2, 2]).is_monotonic_increasing
True
>>> Index([1, 3, 2]).is_monotonic_increasing
False
- nlevels
- Number of levels.
- shape
- Return a tuple of the shape of the underlying data.
Data descriptors inherited from pandas.core.indexes.base.Index:
- array
- The ExtensionArray of the data backing this Series or Index.
Returns
-------
ExtensionArray
An ExtensionArray of the values stored within. For extension
types, this is the actual array. For NumPy native types, this
is a thin (no copy) wrapper around :class:`numpy.ndarray`.
``.array`` differs ``.values`` which may require converting the
data to a different form.
See Also
--------
Index.to_numpy : Similar method that always returns a NumPy array.
Series.to_numpy : Similar method that always returns a NumPy array.
Notes
-----
This table lays out the different array types for each extension
dtype within pandas.
================== =============================
dtype array type
================== =============================
category Categorical
period PeriodArray
interval IntervalArray
IntegerNA IntegerArray
string StringArray
boolean BooleanArray
datetime64[ns, tz] DatetimeArray
================== =============================
For any 3rd-party extension types, the array type will be an
ExtensionArray.
For all remaining dtypes ``.array`` will be a
:class:`arrays.NumpyExtensionArray` wrapping the actual ndarray
stored within. If you absolutely need a NumPy array (possibly with
copying / coercing data), then use :meth:`Series.to_numpy` instead.
Examples
--------
For regular NumPy types like int, and float, a PandasArray
is returned.
>>> pd.Series([1, 2, 3]).array
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray
is returned
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
- is_all_dates
- Whether or not the index values only consist of dates.
- is_unique
- Return if the index has unique values.
- name
- Return Index or MultiIndex name.
- names
Data and other attributes inherited from pandas.core.indexes.base.Index:
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1)
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted Index `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
.. note::
The Index *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.
Parameters
----------
value : array-like or scalar
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort `self` into ascending
order. They are typically the result of ``np.argsort``.
Returns
-------
int or array of int
A scalar or array of insertion points with the
same shape as `value`.
See Also
--------
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.
Notes
-----
Binary search is used to find the required insertion points.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> ser
0 1
1 2
2 3
dtype: int64
>>> ser.searchsorted(4)
3
>>> ser.searchsorted([0, 4])
array([0, 3])
>>> ser.searchsorted([1, 3], side='left')
array([0, 2])
>>> ser.searchsorted([1, 3], side='right')
array([1, 3])
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
>>> ser.searchsorted('3/14/2000')
3
>>> ser = pd.Categorical(
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']
>>> ser.searchsorted('bread')
1
>>> ser.searchsorted(['bread'], side='right')
array([3])
If the values are not monotonically sorted, wrong locations
may be returned:
>>> ser = pd.Series([2, 1, 3])
>>> ser
0 2
1 1
2 3
dtype: int64
>>> ser.searchsorted(1) # doctest: +SKIP
0 # wrong result, correct would be 1
- to_list = tolist(self)
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- nbytes
- Return the number of bytes in the underlying data.
- ndim
- Number of dimensions of the underlying data, by definition 1.
- size
- Return the number of elements in the underlying data.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __divmod__(self, other)
- __eq__(self, other)
- Return self==value.
- __floordiv__(self, other)
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rdivmod__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class DatetimeTZDtype(PandasExtensionDtype) |
| |
DatetimeTZDtype(unit: 'str_type | DatetimeTZDtype' = 'ns', tz=None)
An ExtensionDtype for timezone-aware datetime data.
**This is not an actual numpy dtype**, but a duck type.
Parameters
----------
unit : str, default "ns"
The precision of the datetime data. Currently limited
to ``"ns"``.
tz : str, int, or datetime.tzinfo
The timezone.
Attributes
----------
unit
tz
Methods
-------
None
Raises
------
pytz.UnknownTimeZoneError
When the requested timezone cannot be found.
Examples
--------
>>> pd.DatetimeTZDtype(tz='UTC')
datetime64[ns, UTC]
>>> pd.DatetimeTZDtype(tz='dateutil/US/Central')
datetime64[ns, tzfile('/usr/share/zoneinfo/US/Central')] |
| |
- Method resolution order:
- DatetimeTZDtype
- PandasExtensionDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Methods defined here:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __init__(self, unit: 'str_type | DatetimeTZDtype' = 'ns', tz=None)
- Initialize self. See help(type(self)) for accurate signature.
- __setstate__(self, state) -> 'None'
- __str__(self) -> 'str_type'
- Return str(self).
Class methods defined here:
- construct_array_type() -> 'type_t[DatetimeArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
- construct_from_string(string: 'str_type') -> 'DatetimeTZDtype' from builtins.type
- Construct a DatetimeTZDtype from a string.
Parameters
----------
string : str
The string alias for this DatetimeTZDtype.
Should be formatted like ``datetime64[ns, <tz>]``,
where ``<tz>`` is the timezone name.
Examples
--------
>>> DatetimeTZDtype.construct_from_string('datetime64[ns, UTC]')
datetime64[ns, UTC]
Readonly properties defined here:
- name
- A string representation of the dtype.
- tz
- The timezone.
- unit
- The precision of the datetime data.
Data and other attributes defined here:
- __annotations__ = {'_cache_dtypes': 'dict[str_type, PandasExtensionDtype]', 'kind': 'str_type', 'type': 'type[Timestamp]'}
- base = dtype('<M8[ns]')
- kind = 'M'
- na_value = NaT
- num = 101
- str = '|M8[ns]'
- type = <class 'pandas._libs.tslibs.timestamps.Timestamp'>
- Pandas replacement for python datetime.datetime object.
Timestamp is the pandas equivalent of python's Datetime
and is interchangeable with it in most cases. It's the type used
for the entries that make up a DatetimeIndex, and other timeseries
oriented data structures in pandas.
Parameters
----------
ts_input : datetime-like, str, int, float
Value to be converted to Timestamp.
freq : str, DateOffset
Offset which Timestamp will have.
tz : str, pytz.timezone, dateutil.tz.tzfile or None
Time zone for time which Timestamp will have.
unit : str
Unit used for conversion if ts_input is of type int or float. The
valid values are 'D', 'h', 'm', 's', 'ms', 'us', and 'ns'. For
example, 's' means seconds and 'ms' means milliseconds.
year, month, day : int
hour, minute, second, microsecond : int, optional, default 0
nanosecond : int, optional, default 0
tzinfo : datetime.tzinfo, optional, default None
fold : {0, 1}, default None, keyword-only
Due to daylight saving time, one wall clock time can occur twice
when shifting from summer to winter time; fold describes whether the
datetime-like corresponds to the first (0) or the second time (1)
the wall clock hits the ambiguous time.
.. versionadded:: 1.1.0
Notes
-----
There are essentially three calling conventions for the constructor. The
primary form accepts four parameters. They can be passed by position or
keyword.
The other two forms mimic the parameters from ``datetime.datetime``. They
can be passed by either position or keyword, but not both mixed together.
Examples
--------
Using the primary calling convention:
This converts a datetime-like string
>>> pd.Timestamp('2017-01-01T12')
Timestamp('2017-01-01 12:00:00')
This converts a float representing a Unix epoch in units of seconds
>>> pd.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')
This converts an int representing a Unix-epoch in units of seconds
and for a particular timezone
>>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific')
Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific')
Using the other two forms that mimic the API for ``datetime.datetime``:
>>> pd.Timestamp(2017, 1, 1, 12)
Timestamp('2017-01-01 12:00:00')
>>> pd.Timestamp(year=2017, month=1, day=1, hour=12)
Timestamp('2017-01-01 12:00:00')
Methods inherited from PandasExtensionDtype:
- __getstate__(self) -> 'dict[str_type, Any]'
- __repr__(self) -> 'str_type'
- Return a string representation for a particular object.
Class methods inherited from PandasExtensionDtype:
- reset_cache() -> 'None' from builtins.type
- clear the cache
Data and other attributes inherited from PandasExtensionDtype:
- isbuiltin = 0
- isnative = 0
- itemsize = 8
- shape = ()
- subdtype = None
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class ExcelFile(builtins.object) |
| |
ExcelFile(path_or_buffer, engine: 'str | None' = None, storage_options: 'StorageOptions' = None)
Class for parsing tabular excel sheets into DataFrame objects.
See read_excel for more documentation.
Parameters
----------
path_or_buffer : str, bytes, path object (pathlib.Path or py._path.local.LocalPath),
a file-like object, xlrd workbook or openpyxl workbook.
If a string or path object, expected to be a path to a
.xls, .xlsx, .xlsb, .xlsm, .odf, .ods, or .odt file.
engine : str, default None
If io is not a buffer or path, this must be set to identify io.
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``
Engine compatibility :
- ``xlrd`` supports old-style Excel files (.xls).
- ``openpyxl`` supports newer Excel file formats.
- ``odf`` supports OpenDocument file formats (.odf, .ods, .odt).
- ``pyxlsb`` supports Binary Excel files.
.. versionchanged:: 1.2.0
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
now only supports old-style ``.xls`` files.
When ``engine=None``, the following logic will be
used to determine the engine:
- If ``path_or_buffer`` is an OpenDocument format (.odf, .ods, .odt),
then `odf <https://pypi.org/project/odfpy/>`_ will be used.
- Otherwise if ``path_or_buffer`` is an xls format,
``xlrd`` will be used.
- Otherwise if ``path_or_buffer`` is in xlsb format,
`pyxlsb <https://pypi.org/project/pyxlsb/>`_ will be used.
.. versionadded:: 1.3.0
- Otherwise if `openpyxl <https://pypi.org/project/openpyxl/>`_ is installed,
then ``openpyxl`` will be used.
- Otherwise if ``xlrd >= 2.0`` is installed, a ``ValueError`` will be raised.
- Otherwise ``xlrd`` will be used and a ``FutureWarning`` will be raised.
This case will raise a ``ValueError`` in a future version of pandas.
.. warning::
Please do not report issues when using ``xlrd`` to read ``.xlsx`` files.
This is not supported, switch to using ``openpyxl`` instead. |
| |
Methods defined here:
- __del__(self)
- __enter__(self)
- __exit__(self, exc_type, exc_value, traceback)
- __fspath__(self)
- __init__(self, path_or_buffer, engine: 'str | None' = None, storage_options: 'StorageOptions' = None)
- Initialize self. See help(type(self)) for accurate signature.
- close(self) -> 'None'
- close io if necessary
- parse(self, sheet_name: 'str | int | list[int] | list[str] | None' = 0, header: 'int | Sequence[int] | None' = 0, names=None, index_col: 'int | Sequence[int] | None' = None, usecols=None, squeeze: 'bool | None' = None, converters=None, true_values: 'Iterable[Hashable] | None' = None, false_values: 'Iterable[Hashable] | None' = None, skiprows: 'Sequence[int] | int | Callable[[int], object] | None' = None, nrows: 'int | None' = None, na_values=None, parse_dates=False, date_parser=None, thousands: 'str | None' = None, comment: 'str | None' = None, skipfooter: 'int' = 0, convert_float: 'bool | None' = None, mangle_dupe_cols: 'bool' = True, **kwds) -> 'DataFrame | dict[str, DataFrame] | dict[int, DataFrame]'
- Parse specified sheet(s) into a DataFrame.
Equivalent to read_excel(ExcelFile, ...) See the read_excel
docstring for more info on accepted parameters.
Returns
-------
DataFrame or dict of DataFrames
DataFrame from the passed in Excel file.
Readonly properties defined here:
- book
- sheet_names
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- ODFReader = <class 'pandas.io.excel._odfreader.ODFReader'>
- Read tables out of OpenDocument formatted files.
Parameters
----------
filepath_or_buffer : str, path to be parsed or
an open readable stream.
storage_options : dict, optional
passed to fsspec for appropriate URLs (see ``_get_filepath_or_buffer``)
- OpenpyxlReader = <class 'pandas.io.excel._openpyxl.OpenpyxlReader'>
- PyxlsbReader = <class 'pandas.io.excel._pyxlsb.PyxlsbReader'>
- XlrdReader = <class 'pandas.io.excel._xlrd.XlrdReader'>
- __annotations__ = {'_engines': 'Mapping[str, Any]'}
|
class ExcelWriter(builtins.object) |
| |
ExcelWriter(path: 'FilePath | WriteExcelBuffer | ExcelWriter', engine: 'str | None' = None, date_format: 'str | None' = None, datetime_format: 'str | None' = None, mode: 'str' = 'w', storage_options: 'StorageOptions' = None, if_sheet_exists: "Literal[('error', 'new', 'replace', 'overlay')] | None" = None, engine_kwargs: 'dict | None' = None, **kwargs)
Class for writing DataFrame objects into excel sheets.
Default is to use :
* xlwt for xls
* xlsxwriter for xlsx if xlsxwriter is installed otherwise openpyxl
* odf for ods.
See DataFrame.to_excel for typical usage.
The writer should be used as a context manager. Otherwise, call `close()` to save
and close any opened file handles.
Parameters
----------
path : str or typing.BinaryIO
Path to xls or xlsx or ods file.
engine : str (optional)
Engine to use for writing. If None, defaults to
``io.excel.<extension>.writer``. NOTE: can only be passed as a keyword
argument.
.. deprecated:: 1.2.0
As the `xlwt <https://pypi.org/project/xlwt/>`__ package is no longer
maintained, the ``xlwt`` engine will be removed in a future
version of pandas.
date_format : str, default None
Format string for dates written into Excel files (e.g. 'YYYY-MM-DD').
datetime_format : str, default None
Format string for datetime objects written into Excel files.
(e.g. 'YYYY-MM-DD HH:MM:SS').
mode : {'w', 'a'}, default 'w'
File mode to use (write or append). Append does not work with fsspec URLs.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc., if using a URL that will
be parsed by ``fsspec``, e.g., starting "s3://", "gcs://".
.. versionadded:: 1.2.0
if_sheet_exists : {'error', 'new', 'replace', 'overlay'}, default 'error'
How to behave when trying to write to a sheet that already
exists (append mode only).
* error: raise a ValueError.
* new: Create a new sheet, with a name determined by the engine.
* replace: Delete the contents of the sheet before writing to it.
* overlay: Write contents to the existing sheet without removing the old
contents.
.. versionadded:: 1.3.0
.. versionchanged:: 1.4.0
Added ``overlay`` option
engine_kwargs : dict, optional
Keyword arguments to be passed into the engine. These will be passed to
the following functions of the respective engines:
* xlsxwriter: ``xlsxwriter.Workbook(file, **engine_kwargs)``
* openpyxl (write mode): ``openpyxl.Workbook(**engine_kwargs)``
* openpyxl (append mode): ``openpyxl.load_workbook(file, **engine_kwargs)``
* odswriter: ``odf.opendocument.OpenDocumentSpreadsheet(**engine_kwargs)``
.. versionadded:: 1.3.0
**kwargs : dict, optional
Keyword arguments to be passed into the engine.
.. deprecated:: 1.3.0
Use engine_kwargs instead.
Attributes
----------
None
Methods
-------
None
Notes
-----
None of the methods and properties are considered public.
For compatibility with CSV writers, ExcelWriter serializes lists
and dicts to strings before writing.
Examples
--------
Default usage:
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"]) # doctest: +SKIP
>>> with pd.ExcelWriter("path_to_file.xlsx") as writer:
... df.to_excel(writer) # doctest: +SKIP
To write to separate sheets in a single file:
>>> df1 = pd.DataFrame([["AAA", "BBB"]], columns=["Spam", "Egg"]) # doctest: +SKIP
>>> df2 = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"]) # doctest: +SKIP
>>> with pd.ExcelWriter("path_to_file.xlsx") as writer:
... df1.to_excel(writer, sheet_name="Sheet1") # doctest: +SKIP
... df2.to_excel(writer, sheet_name="Sheet2") # doctest: +SKIP
You can set the date format or datetime format:
>>> from datetime import date, datetime # doctest: +SKIP
>>> df = pd.DataFrame(
... [
... [date(2014, 1, 31), date(1999, 9, 24)],
... [datetime(1998, 5, 26, 23, 33, 4), datetime(2014, 2, 28, 13, 5, 13)],
... ],
... index=["Date", "Datetime"],
... columns=["X", "Y"],
... ) # doctest: +SKIP
>>> with pd.ExcelWriter(
... "path_to_file.xlsx",
... date_format="YYYY-MM-DD",
... datetime_format="YYYY-MM-DD HH:MM:SS"
... ) as writer:
... df.to_excel(writer) # doctest: +SKIP
You can also append to an existing Excel file:
>>> with pd.ExcelWriter("path_to_file.xlsx", mode="a", engine="openpyxl") as writer:
... df.to_excel(writer, sheet_name="Sheet3") # doctest: +SKIP
Here, the `if_sheet_exists` parameter can be set to replace a sheet if it
already exists:
>>> with ExcelWriter(
... "path_to_file.xlsx",
... mode="a",
... engine="openpyxl",
... if_sheet_exists="replace",
... ) as writer:
... df.to_excel(writer, sheet_name="Sheet1") # doctest: +SKIP
You can also write multiple DataFrames to a single sheet. Note that the
``if_sheet_exists`` parameter needs to be set to ``overlay``:
>>> with ExcelWriter("path_to_file.xlsx",
... mode="a",
... engine="openpyxl",
... if_sheet_exists="overlay",
... ) as writer:
... df1.to_excel(writer, sheet_name="Sheet1")
... df2.to_excel(writer, sheet_name="Sheet1", startcol=3) # doctest: +SKIP
You can store Excel file in RAM:
>>> import io
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"])
>>> buffer = io.BytesIO()
>>> with pd.ExcelWriter(buffer) as writer:
... df.to_excel(writer)
You can pack Excel file into zip archive:
>>> import zipfile # doctest: +SKIP
>>> df = pd.DataFrame([["ABC", "XYZ"]], columns=["Foo", "Bar"]) # doctest: +SKIP
>>> with zipfile.ZipFile("path_to_file.zip", "w") as zf:
... with zf.open("filename.xlsx", "w") as buffer:
... with pd.ExcelWriter(buffer) as writer:
... df.to_excel(writer) # doctest: +SKIP
You can specify additional arguments to the underlying engine:
>>> with pd.ExcelWriter(
... "path_to_file.xlsx",
... engine="xlsxwriter",
... engine_kwargs={"options": {"nan_inf_to_errors": True}}
... ) as writer:
... df.to_excel(writer) # doctest: +SKIP
In append mode, ``engine_kwargs`` are passed through to
openpyxl's ``load_workbook``:
>>> with pd.ExcelWriter(
... "path_to_file.xlsx",
... engine="openpyxl",
... mode="a",
... engine_kwargs={"keep_vba": True}
... ) as writer:
... df.to_excel(writer, sheet_name="Sheet2") # doctest: +SKIP |
| |
Methods defined here:
- __enter__(self)
- # Allow use as a contextmanager
- __exit__(self, exc_type, exc_value, traceback)
- __fspath__(self)
- __init__(self, path: 'FilePath | WriteExcelBuffer | ExcelWriter', engine: 'str | None' = None, date_format: 'str | None' = None, datetime_format: 'str | None' = None, mode: 'str' = 'w', storage_options: 'StorageOptions' = None, if_sheet_exists: 'str | None' = None, engine_kwargs: 'dict | None' = None, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- close(self) -> 'None'
- synonym for save, to make it more file-like
- save(self) -> 'None'
- Save workbook to disk.
- write_cells(self, cells, sheet_name: 'str | None' = None, startrow: 'int' = 0, startcol: 'int' = 0, freeze_panes: 'tuple[int, int] | None' = None) -> 'None'
- Write given formatted cells into Excel an excel sheet
Parameters
----------
cells : generator
cell of formatted data to save to Excel sheet
sheet_name : str, default None
Name of Excel sheet, if None, then use self.cur_sheet
startrow : upper left cell row to dump data frame
startcol : upper left cell column to dump data frame
freeze_panes: int tuple of length 2
contains the bottom-most row and right-most column to freeze
Class methods defined here:
- check_extension(ext: 'str') -> 'Literal[True]' from abc.ABCMeta
- checks that path's extension against the Writer's supported
extensions. If it isn't supported, raises UnsupportedFiletypeError.
Static methods defined here:
- __new__(cls, path: 'FilePath | WriteExcelBuffer | ExcelWriter', engine: 'str | None' = None, date_format: 'str | None' = None, datetime_format: 'str | None' = None, mode: 'str' = 'w', storage_options: 'StorageOptions' = None, if_sheet_exists: "Literal[('error', 'new', 'replace', 'overlay')] | None" = None, engine_kwargs: 'dict | None' = None, **kwargs)
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- engine
- Name of engine.
- supported_extensions
- Extensions that writer engine supports.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- __abstractmethods__ = frozenset({'engine', 'save', 'supported_extensions', 'write_cells'})
- path = None
|
class Flags(builtins.object) |
| |
Flags(obj, *, allows_duplicate_labels)
Flags that apply to pandas objects.
.. versionadded:: 1.2.0
Parameters
----------
obj : Series or DataFrame
The object these flags are associated with.
allows_duplicate_labels : bool, default True
Whether to allow duplicate labels in this object. By default,
duplicate labels are permitted. Setting this to ``False`` will
cause an :class:`errors.DuplicateLabelError` to be raised when
`index` (or columns for DataFrame) is not unique, or any
subsequent operation on introduces duplicates.
See :ref:`duplicates.disallow` for more.
.. warning::
This is an experimental feature. Currently, many methods fail to
propagate the ``allows_duplicate_labels`` value. In future versions
it is expected that every method taking or returning one or more
DataFrame or Series objects will propagate ``allows_duplicate_labels``.
Notes
-----
Attributes can be set in two ways
>>> df = pd.DataFrame()
>>> df.flags
<Flags(allows_duplicate_labels=True)>
>>> df.flags.allows_duplicate_labels = False
>>> df.flags
<Flags(allows_duplicate_labels=False)>
>>> df.flags['allows_duplicate_labels'] = True
>>> df.flags
<Flags(allows_duplicate_labels=True)> |
| |
Methods defined here:
- __eq__(self, other)
- Return self==value.
- __getitem__(self, key)
- __init__(self, obj, *, allows_duplicate_labels)
- Initialize self. See help(type(self)) for accurate signature.
- __repr__(self)
- Return repr(self).
- __setitem__(self, key, value)
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
- allows_duplicate_labels
- Whether this object allows duplicate labels.
Setting ``allows_duplicate_labels=False`` ensures that the
index (and columns of a DataFrame) are unique. Most methods
that accept and return a Series or DataFrame will propagate
the value of ``allows_duplicate_labels``.
See :ref:`duplicates` for more.
See Also
--------
DataFrame.attrs : Set global metadata on this object.
DataFrame.set_flags : Set global flags on this object.
Examples
--------
>>> df = pd.DataFrame({"A": [1, 2]}, index=['a', 'a'])
>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False
Traceback (most recent call last):
...
pandas.errors.DuplicateLabelError: Index has duplicates.
positions
label
a [0, 1]
Data and other attributes defined here:
- __hash__ = None
|
class Float32Dtype(FloatingDtype) |
| |
An ExtensionDtype for float32 data.
This dtype uses ``pd.NA`` as missing value indicator.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- Float32Dtype
- FloatingDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'Float32'
- type = <class 'numpy.float32'>
- Single-precision floating-point number type, compatible with C ``float``.
:Character code: ``'f'``
:Canonical name: `numpy.single`
:Alias on this platform (Windows AMD64): `numpy.float32`: 32-bit-precision floating-point number type: sign bit, 8 bits exponent, 23 bits mantissa.
Methods inherited from FloatingDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from FloatingDtype:
- construct_array_type() -> 'type[FloatingArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Float64Dtype(FloatingDtype) |
| |
An ExtensionDtype for float64 data.
This dtype uses ``pd.NA`` as missing value indicator.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- Float64Dtype
- FloatingDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'Float64'
- type = <class 'numpy.float64'>
- Double-precision floating-point number type, compatible with Python `float`
and C ``double``.
:Character code: ``'d'``
:Canonical name: `numpy.double`
:Alias: `numpy.float_`
:Alias on this platform (Windows AMD64): `numpy.float64`: 64-bit precision floating-point number type: sign bit, 11 bits exponent, 52 bits mantissa.
Methods inherited from FloatingDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from FloatingDtype:
- construct_array_type() -> 'type[FloatingArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Grouper(builtins.object) |
| |
Grouper(*args, **kwargs)
A Grouper allows the user to specify a groupby instruction for an object.
This specification will select a column via the key parameter, or if the
level and/or axis parameters are given, a level of the index of the target
object.
If `axis` and/or `level` are passed as keywords to both `Grouper` and
`groupby`, the values passed to `Grouper` take precedence.
Parameters
----------
key : str, defaults to None
Groupby key, which selects the grouping column of the target.
level : name/number, defaults to None
The level for the target index.
freq : str / frequency object, defaults to None
This will groupby the specified frequency if the target selection
(via key or level) is a datetime-like object. For full specification
of available frequencies, please see `here
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_.
axis : str, int, defaults to 0
Number/name of the axis.
sort : bool, default to False
Whether to sort the resulting labels.
closed : {'left' or 'right'}
Closed end of interval. Only when `freq` parameter is passed.
label : {'left' or 'right'}
Interval boundary to use for labeling.
Only when `freq` parameter is passed.
convention : {'start', 'end', 'e', 's'}
If grouper is PeriodIndex and `freq` parameter is passed.
base : int, default 0
Only when `freq` parameter is passed.
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0.
.. deprecated:: 1.1.0
The new arguments that you should use are 'offset' or 'origin'.
loffset : str, DateOffset, timedelta object
Only when `freq` parameter is passed.
.. deprecated:: 1.1.0
loffset is only working for ``.resample(...)`` and not for
Grouper (:issue:`28302`).
However, loffset is also deprecated for ``.resample(...)``
See: :class:`DataFrame.resample`
origin : Timestamp or str, default 'start_day'
The timestamp on which to adjust the grouping. The timezone of origin must
match the timezone of the index.
If string, must be one of the following:
- 'epoch': `origin` is 1970-01-01
- 'start': `origin` is the first value of the timeseries
- 'start_day': `origin` is the first day at midnight of the timeseries
.. versionadded:: 1.1.0
- 'end': `origin` is the last value of the timeseries
- 'end_day': `origin` is the ceiling midnight of the last day
.. versionadded:: 1.3.0
offset : Timedelta or str, default is None
An offset timedelta added to the origin.
.. versionadded:: 1.1.0
dropna : bool, default True
If True, and if group keys contain NA values, NA values together with
row/column will be dropped. If False, NA values will also be treated as
the key in groups.
.. versionadded:: 1.2.0
Returns
-------
A specification for a groupby instruction
Examples
--------
Syntactic sugar for ``df.groupby('A')``
>>> df = pd.DataFrame(
... {
... "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"],
... "Speed": [100, 5, 200, 300, 15],
... }
... )
>>> df
Animal Speed
0 Falcon 100
1 Parrot 5
2 Falcon 200
3 Falcon 300
4 Parrot 15
>>> df.groupby(pd.Grouper(key="Animal")).mean()
Speed
Animal
Falcon 200.0
Parrot 10.0
Specify a resample operation on the column 'Publish date'
>>> df = pd.DataFrame(
... {
... "Publish date": [
... pd.Timestamp("2000-01-02"),
... pd.Timestamp("2000-01-02"),
... pd.Timestamp("2000-01-09"),
... pd.Timestamp("2000-01-16")
... ],
... "ID": [0, 1, 2, 3],
... "Price": [10, 20, 30, 40]
... }
... )
>>> df
Publish date ID Price
0 2000-01-02 0 10
1 2000-01-02 1 20
2 2000-01-09 2 30
3 2000-01-16 3 40
>>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()
ID Price
Publish date
2000-01-02 0.5 15.0
2000-01-09 2.0 30.0
2000-01-16 3.0 40.0
If you want to adjust the start of the bins based on a fixed timestamp:
>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00 0
2000-10-01 23:37:00 3
2000-10-01 23:44:00 6
2000-10-01 23:51:00 9
2000-10-01 23:58:00 12
2000-10-02 00:05:00 15
2000-10-02 00:12:00 18
2000-10-02 00:19:00 21
2000-10-02 00:26:00 24
Freq: 7T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min')).sum()
2000-10-01 23:14:00 0
2000-10-01 23:31:00 9
2000-10-01 23:48:00 21
2000-10-02 00:05:00 54
2000-10-02 00:22:00 24
Freq: 17T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min', origin='epoch')).sum()
2000-10-01 23:18:00 0
2000-10-01 23:35:00 18
2000-10-01 23:52:00 27
2000-10-02 00:09:00 39
2000-10-02 00:26:00 24
Freq: 17T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min', origin='2000-01-01')).sum()
2000-10-01 23:24:00 3
2000-10-01 23:41:00 15
2000-10-01 23:58:00 45
2000-10-02 00:15:00 45
Freq: 17T, dtype: int64
If you want to adjust the start of the bins with an `offset` Timedelta, the two
following lines are equivalent:
>>> ts.groupby(pd.Grouper(freq='17min', origin='start')).sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min', offset='23h30min')).sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
To replace the use of the deprecated `base` argument, you can now use `offset`,
in this example it is equivalent to have `base=2`:
>>> ts.groupby(pd.Grouper(freq='17min', offset='2min')).sum()
2000-10-01 23:16:00 0
2000-10-01 23:33:00 9
2000-10-01 23:50:00 36
2000-10-02 00:07:00 39
2000-10-02 00:24:00 24
Freq: 17T, dtype: int64 |
| |
Methods defined here:
- __init__(self, key=None, level=None, freq=None, axis: 'int' = 0, sort: 'bool' = False, dropna: 'bool' = True)
- Initialize self. See help(type(self)) for accurate signature.
- __repr__(self) -> 'str'
- Return repr(self).
Static methods defined here:
- __new__(cls, *args, **kwargs)
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- ax
- groups
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- __annotations__ = {'_attributes': 'tuple[str, ...]', '_gpr_index': 'Index | None', '_grouper': 'Index | None', 'axis': 'int', 'dropna': 'bool', 'sort': 'bool'}
|
class HDFStore(builtins.object) |
| |
HDFStore(path, mode: 'str' = 'a', complevel: 'int | None' = None, complib=None, fletcher32: 'bool' = False, **kwargs)
Dict-like IO interface for storing pandas objects in PyTables.
Either Fixed or Table format.
.. warning::
Pandas uses PyTables for reading and writing HDF5 files, which allows
serializing object-dtype data with pickle when using the "fixed" format.
Loading pickled data received from untrusted sources can be unsafe.
See: https://docs.python.org/3/library/pickle.html for more.
Parameters
----------
path : str
File path to HDF5 file.
mode : {'a', 'w', 'r', 'r+'}, default 'a'
``'r'``
Read-only; no data can be modified.
``'w'``
Write; a new file is created (an existing file with the same
name would be deleted).
``'a'``
Append; an existing file is opened for reading and writing,
and if the file does not exist it is created.
``'r+'``
It is similar to ``'a'``, but the file must already exist.
complevel : int, 0-9, default None
Specifies a compression level for data.
A value of 0 or None disables compression.
complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
Specifies the compression library to be used.
As of v0.20.2 these additional compressors for Blosc are supported
(default if no compressor specified: 'blosc:blosclz'):
{'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
'blosc:zlib', 'blosc:zstd'}.
Specifying a compression library which is not available issues
a ValueError.
fletcher32 : bool, default False
If applying compression use the fletcher32 checksum.
**kwargs
These parameters will be passed to the PyTables open_file method.
Examples
--------
>>> bar = pd.DataFrame(np.random.randn(10, 4))
>>> store = pd.HDFStore('test.h5')
>>> store['foo'] = bar # write to HDF5
>>> bar = store['foo'] # retrieve
>>> store.close()
**Create or load HDF5 file in-memory**
When passing the `driver` option to the PyTables open_file method through
**kwargs, the HDF5 file is loaded or created in-memory and will only be
written when closed:
>>> bar = pd.DataFrame(np.random.randn(10, 4))
>>> store = pd.HDFStore('test.h5', driver='H5FD_CORE')
>>> store['foo'] = bar
>>> store.close() # only now, data is written to disk |
| |
Methods defined here:
- __contains__(self, key: 'str') -> 'bool'
- check for existence of this key
can match the exact pathname or the pathnm w/o the leading '/'
- __delitem__(self, key: 'str')
- __enter__(self)
- __exit__(self, exc_type, exc_value, traceback)
- __fspath__(self)
- __getattr__(self, name: 'str')
- allow attribute access to get stores
- __getitem__(self, key: 'str')
- __init__(self, path, mode: 'str' = 'a', complevel: 'int | None' = None, complib=None, fletcher32: 'bool' = False, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- __iter__(self)
- __len__(self) -> 'int'
- __repr__(self) -> 'str'
- Return repr(self).
- __setitem__(self, key: 'str', value)
- append(self, key: 'str', value: 'DataFrame | Series', format=None, axes=None, index=True, append=True, complib=None, complevel: 'int | None' = None, columns=None, min_itemsize: 'int | dict[str, int] | None' = None, nan_rep=None, chunksize=None, expectedrows=None, dropna: 'bool | None' = None, data_columns: 'Literal[True] | list[str] | None' = None, encoding=None, errors: 'str' = 'strict')
- Append to Table in file. Node must already exist and be Table
format.
Parameters
----------
key : str
value : {Series, DataFrame}
format : 'table' is the default
Format to use when storing object in HDFStore. Value can be one of:
``'table'``
Table format. Write as a PyTables Table structure which may perform
worse but allow more flexible operations like searching / selecting
subsets of the data.
append : bool, default True
Append the input data to the existing.
data_columns : list of columns, or True, default None
List of columns to create as indexed data columns for on-disk
queries, or True to use all columns. By default only the axes
of the object are indexed. See `here
<https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#query-via-data-columns>`__.
min_itemsize : dict of columns that specify minimum str sizes
nan_rep : str to use as str nan representation
chunksize : size to chunk the writing
expectedrows : expected TOTAL row size of this table
encoding : default None, provide an encoding for str
dropna : bool, default False
Do not write an ALL nan row to the store settable
by the option 'io.hdf.dropna_table'.
Notes
-----
Does *not* check if data being appended overlaps with existing
data in the table, so be careful
- append_to_multiple(self, d: 'dict', value, selector, data_columns=None, axes=None, dropna=False, **kwargs)
- Append to multiple tables
Parameters
----------
d : a dict of table_name to table_columns, None is acceptable as the
values of one node (this will get all the remaining columns)
value : a pandas object
selector : a string that designates the indexable table; all of its
columns will be designed as data_columns, unless data_columns is
passed, in which case these are used
data_columns : list of columns to create as data columns, or True to
use all columns
dropna : if evaluates to True, drop rows from all tables if any single
row in each table has all NaN. Default False.
Notes
-----
axes parameter is currently not accepted
- close(self)
- Close the PyTables file handle
- copy(self, file, mode='w', propindexes: 'bool' = True, keys=None, complib=None, complevel: 'int | None' = None, fletcher32: 'bool' = False, overwrite=True)
- Copy the existing store to a new file, updating in place.
Parameters
----------
propindexes : bool, default True
Restore indexes in copied file.
keys : list, optional
List of keys to include in the copy (defaults to all).
overwrite : bool, default True
Whether to overwrite (remove and replace) existing nodes in the new store.
mode, complib, complevel, fletcher32 same as in HDFStore.__init__
Returns
-------
open file handle of the new store
- create_table_index(self, key: 'str', columns=None, optlevel: 'int | None' = None, kind: 'str | None' = None)
- Create a pytables index on the table.
Parameters
----------
key : str
columns : None, bool, or listlike[str]
Indicate which columns to create an index on.
* False : Do not create any indexes.
* True : Create indexes on all columns.
* None : Create indexes on all columns.
* listlike : Create indexes on the given columns.
optlevel : int or None, default None
Optimization level, if None, pytables defaults to 6.
kind : str or None, default None
Kind of index, if None, pytables defaults to "medium".
Raises
------
TypeError: raises if the node is not a table
- flush(self, fsync: 'bool' = False)
- Force all buffered modifications to be written to disk.
Parameters
----------
fsync : bool (default False)
call ``os.fsync()`` on the file handle to force writing to disk.
Notes
-----
Without ``fsync=True``, flushing may not guarantee that the OS writes
to disk. With fsync, the operation will block until the OS claims the
file has been written; however, other caching layers may still
interfere.
- get(self, key: 'str')
- Retrieve pandas object stored in file.
Parameters
----------
key : str
Returns
-------
object
Same type as object stored in file.
- get_node(self, key: 'str') -> 'Node | None'
- return the node with the key or None if it does not exist
- get_storer(self, key: 'str') -> 'GenericFixed | Table'
- return the storer object for a key, raise if not in the file
- groups(self)
- Return a list of all the top-level nodes.
Each node returned is not a pandas storage object.
Returns
-------
list
List of objects.
- info(self) -> 'str'
- Print detailed information on the store.
Returns
-------
str
- items(self)
- iterate on key->group
- iteritems = items(self)
- keys(self, include: 'str' = 'pandas') -> 'list[str]'
- Return a list of keys corresponding to objects stored in HDFStore.
Parameters
----------
include : str, default 'pandas'
When kind equals 'pandas' return pandas objects.
When kind equals 'native' return native HDF5 Table objects.
.. versionadded:: 1.1.0
Returns
-------
list
List of ABSOLUTE path-names (e.g. have the leading '/').
Raises
------
raises ValueError if kind has an illegal value
- open(self, mode: 'str' = 'a', **kwargs)
- Open the file in the specified mode
Parameters
----------
mode : {'a', 'w', 'r', 'r+'}, default 'a'
See HDFStore docstring or tables.open_file for info about modes
**kwargs
These parameters will be passed to the PyTables open_file method.
- put(self, key: 'str', value: 'DataFrame | Series', format=None, index=True, append=False, complib=None, complevel: 'int | None' = None, min_itemsize: 'int | dict[str, int] | None' = None, nan_rep=None, data_columns: 'Literal[True] | list[str] | None' = None, encoding=None, errors: 'str' = 'strict', track_times: 'bool' = True, dropna: 'bool' = False)
- Store object in HDFStore.
Parameters
----------
key : str
value : {Series, DataFrame}
format : 'fixed(f)|table(t)', default is 'fixed'
Format to use when storing object in HDFStore. Value can be one of:
``'fixed'``
Fixed format. Fast writing/reading. Not-appendable, nor searchable.
``'table'``
Table format. Write as a PyTables Table structure which may perform
worse but allow more flexible operations like searching / selecting
subsets of the data.
append : bool, default False
This will force Table format, append the input data to the existing.
data_columns : list of columns or True, default None
List of columns to create as data columns, or True to use all columns.
See `here
<https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#query-via-data-columns>`__.
encoding : str, default None
Provide an encoding for strings.
track_times : bool, default True
Parameter is propagated to 'create_table' method of 'PyTables'.
If set to False it enables to have the same h5 files (same hashes)
independent on creation time.
.. versionadded:: 1.1.0
- remove(self, key: 'str', where=None, start=None, stop=None)
- Remove pandas object partially by specifying the where condition
Parameters
----------
key : str
Node to remove or delete rows from
where : list of Term (or convertible) objects, optional
start : integer (defaults to None), row number to start selection
stop : integer (defaults to None), row number to stop selection
Returns
-------
number of rows removed (or None if not a Table)
Raises
------
raises KeyError if key is not a valid store
- select(self, key: 'str', where=None, start=None, stop=None, columns=None, iterator=False, chunksize=None, auto_close: 'bool' = False)
- Retrieve pandas object stored in file, optionally based on where criteria.
.. warning::
Pandas uses PyTables for reading and writing HDF5 files, which allows
serializing object-dtype data with pickle when using the "fixed" format.
Loading pickled data received from untrusted sources can be unsafe.
See: https://docs.python.org/3/library/pickle.html for more.
Parameters
----------
key : str
Object being retrieved from file.
where : list or None
List of Term (or convertible) objects, optional.
start : int or None
Row number to start selection.
stop : int, default None
Row number to stop selection.
columns : list or None
A list of columns that if not None, will limit the return columns.
iterator : bool or False
Returns an iterator.
chunksize : int or None
Number or rows to include in iteration, return an iterator.
auto_close : bool or False
Should automatically close the store when finished.
Returns
-------
object
Retrieved object from file.
- select_as_coordinates(self, key: 'str', where=None, start: 'int | None' = None, stop: 'int | None' = None)
- return the selection as an Index
.. warning::
Pandas uses PyTables for reading and writing HDF5 files, which allows
serializing object-dtype data with pickle when using the "fixed" format.
Loading pickled data received from untrusted sources can be unsafe.
See: https://docs.python.org/3/library/pickle.html for more.
Parameters
----------
key : str
where : list of Term (or convertible) objects, optional
start : integer (defaults to None), row number to start selection
stop : integer (defaults to None), row number to stop selection
- select_as_multiple(self, keys, where=None, selector=None, columns=None, start=None, stop=None, iterator=False, chunksize=None, auto_close: 'bool' = False)
- Retrieve pandas objects from multiple tables.
.. warning::
Pandas uses PyTables for reading and writing HDF5 files, which allows
serializing object-dtype data with pickle when using the "fixed" format.
Loading pickled data received from untrusted sources can be unsafe.
See: https://docs.python.org/3/library/pickle.html for more.
Parameters
----------
keys : a list of the tables
selector : the table to apply the where criteria (defaults to keys[0]
if not supplied)
columns : the columns I want back
start : integer (defaults to None), row number to start selection
stop : integer (defaults to None), row number to stop selection
iterator : bool, return an iterator, default False
chunksize : nrows to include in iteration, return an iterator
auto_close : bool, default False
Should automatically close the store when finished.
Raises
------
raises KeyError if keys or selector is not found or keys is empty
raises TypeError if keys is not a list or tuple
raises ValueError if the tables are not ALL THE SAME DIMENSIONS
- select_column(self, key: 'str', column: 'str', start: 'int | None' = None, stop: 'int | None' = None)
- return a single column from the table. This is generally only useful to
select an indexable
.. warning::
Pandas uses PyTables for reading and writing HDF5 files, which allows
serializing object-dtype data with pickle when using the "fixed" format.
Loading pickled data received from untrusted sources can be unsafe.
See: https://docs.python.org/3/library/pickle.html for more.
Parameters
----------
key : str
column : str
The column of interest.
start : int or None, default None
stop : int or None, default None
Raises
------
raises KeyError if the column is not found (or key is not a valid
store)
raises ValueError if the column can not be extracted individually (it
is part of a data block)
- walk(self, where='/')
- Walk the pytables group hierarchy for pandas objects.
This generator will yield the group path, subgroups and pandas object
names for each group.
Any non-pandas PyTables objects that are not a group will be ignored.
The `where` group itself is listed first (preorder), then each of its
child groups (following an alphanumerical order) is also traversed,
following the same procedure.
Parameters
----------
where : str, default "/"
Group where to start walking.
Yields
------
path : str
Full path to a group (without trailing '/').
groups : list
Names (strings) of the groups contained in `path`.
leaves : list
Names (strings) of the pandas objects contained in `path`.
Readonly properties defined here:
- filename
- is_open
- return a boolean indicating whether the file is open
- root
- return the root node
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- __annotations__ = {'_complevel': 'int', '_fletcher32': 'bool', '_handle': 'File | None', '_mode': 'str'}
|
class Index(pandas.core.base.IndexOpsMixin, pandas.core.base.PandasObject) |
| |
Index(data=None, dtype=None, copy=False, name=None, tupleize_cols=True, **kwargs) -> 'Index'
Immutable sequence used for indexing and alignment. The basic object
storing axis labels for all pandas objects.
Parameters
----------
data : array-like (1-dimensional)
dtype : NumPy dtype (default: object)
If dtype is None, we find the dtype that best fits the data.
If an actual dtype is provided, we coerce to that dtype if it's safe.
Otherwise, an error will be raised.
copy : bool
Make a copy of input ndarray.
name : object
Name to be stored in the index.
tupleize_cols : bool (default: True)
When True, attempt to create a MultiIndex if possible.
See Also
--------
RangeIndex : Index implementing a monotonic integer range.
CategoricalIndex : Index of :class:`Categorical` s.
MultiIndex : A multi-level, or hierarchical Index.
IntervalIndex : An Index of :class:`Interval` s.
DatetimeIndex : Index of datetime64 data.
TimedeltaIndex : Index of timedelta64 data.
PeriodIndex : Index of Period data.
NumericIndex : Index of numpy int/uint/float data.
Int64Index : Index of purely int64 labels (deprecated).
UInt64Index : Index of purely uint64 labels (deprecated).
Float64Index : Index of purely float64 labels (deprecated).
Notes
-----
An Index instance can **only** contain hashable objects
Examples
--------
>>> pd.Index([1, 2, 3])
Int64Index([1, 2, 3], dtype='int64')
>>> pd.Index(list('abc'))
Index(['a', 'b', 'c'], dtype='object') |
| |
- Method resolution order:
- Index
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- builtins.object
Methods defined here:
- __abs__(self)
- __and__(self, other)
- __array__(self, dtype=None) -> 'np.ndarray'
- The array interface, return my values.
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str_t', *inputs, **kwargs)
- __array_wrap__(self, result, context=None)
- Gets called after a ufunc and other functions e.g. np.split.
- __bool__ = __nonzero__(self)
- __contains__(self, key: 'Any') -> 'bool'
- Return a boolean indicating whether the provided key is in the index.
Parameters
----------
key : label
The key to check if it is present in the index.
Returns
-------
bool
Whether the key search is in the index.
Raises
------
TypeError
If the key is not hashable.
See Also
--------
Index.isin : Returns an ndarray of boolean dtype indicating whether the
list-like key is in the index.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> 2 in idx
True
>>> 6 in idx
False
- __copy__(self: '_IndexT', **kwargs) -> '_IndexT'
- __deepcopy__(self: '_IndexT', memo=None) -> '_IndexT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __getitem__(self, key)
- Override numpy.ndarray's __getitem__ method to work as desired.
This function adds lists and Series as valid boolean indexers
(ndarrays only supports ndarray with dtype=bool).
If resulting ndim != 1, plain ndarray is returned instead of
corresponding `Index` subclass.
- __iadd__(self, other)
- __invert__(self)
- __len__(self) -> 'int'
- Return the length of the Index.
- __neg__(self)
- __nonzero__(self)
- __or__(self, other)
- __pos__(self)
- __reduce__(self)
- Helper for pickle.
- __repr__(self) -> 'str_t'
- Return a string representation for this object.
- __setitem__(self, key, value)
- __xor__(self, other)
- all(self, *args, **kwargs)
- Return whether all elements are Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
all : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.any : Return whether any element in an Index is True.
Series.any : Return whether any element in a Series is True.
Series.all : Return whether all elements in a Series are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because ``0`` is considered False.
>>> pd.Index([0, 1, 2]).all()
False
- any(self, *args, **kwargs)
- Return whether any element is Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
any : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.all : Return whether all elements are True.
Series.all : Return whether all elements are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
>>> index = pd.Index([0, 1, 2])
>>> index.any()
True
>>> index = pd.Index([0, 0, 0])
>>> index.any()
False
- append(self, other: 'Index | Sequence[Index]') -> 'Index'
- Append a collection of Index options together.
Parameters
----------
other : Index or list/tuple of indices
Returns
-------
Index
- argmax(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argsort(self, *args, **kwargs) -> 'npt.NDArray[np.intp]'
- Return the integer indices that would sort the index.
Parameters
----------
*args
Passed to `numpy.ndarray.argsort`.
**kwargs
Passed to `numpy.ndarray.argsort`.
Returns
-------
np.ndarray[np.intp]
Integer indices that would sort the index if used as
an indexer.
See Also
--------
numpy.argsort : Similar method for NumPy arrays.
Index.sort_values : Return sorted copy of Index.
Examples
--------
>>> idx = pd.Index(['b', 'a', 'd', 'c'])
>>> idx
Index(['b', 'a', 'd', 'c'], dtype='object')
>>> order = idx.argsort()
>>> order
array([1, 0, 3, 2])
>>> idx[order]
Index(['a', 'b', 'c', 'd'], dtype='object')
- asof(self, label)
- Return the label from the index, or, if not present, the previous one.
Assuming that the index is sorted, return the passed index label if it
is in the index, or return the previous index label if the passed one
is not in the index.
Parameters
----------
label : object
The label up to which the method returns the latest index label.
Returns
-------
object
The passed label if it is in the index. The previous label if the
passed label is not in the sorted index or `NaN` if there is no
such label.
See Also
--------
Series.asof : Return the latest value in a Series up to the
passed index.
merge_asof : Perform an asof merge (similar to left join but it
matches on nearest key rather than equal key).
Index.get_loc : An `asof` is a thin wrapper around `get_loc`
with method='pad'.
Examples
--------
`Index.asof` returns the latest index label up to the passed label.
>>> idx = pd.Index(['2013-12-31', '2014-01-02', '2014-01-03'])
>>> idx.asof('2014-01-01')
'2013-12-31'
If the label is in the index, the method returns the passed label.
>>> idx.asof('2014-01-02')
'2014-01-02'
If all of the labels in the index are later than the passed label,
NaN is returned.
>>> idx.asof('1999-01-02')
nan
If the index is not sorted, an error is raised.
>>> idx_not_sorted = pd.Index(['2013-12-31', '2015-01-02',
... '2014-01-03'])
>>> idx_not_sorted.asof('2013-12-31')
Traceback (most recent call last):
ValueError: index must be monotonic increasing or decreasing
- asof_locs(self, where: 'Index', mask: 'np.ndarray') -> 'npt.NDArray[np.intp]'
- Return the locations (indices) of labels in the index.
As in the `asof` function, if the label (a particular entry in
`where`) is not in the index, the latest index label up to the
passed label is chosen and its index returned.
If all of the labels in the index are later than a label in `where`,
-1 is returned.
`mask` is used to ignore NA values in the index during calculation.
Parameters
----------
where : Index
An Index consisting of an array of timestamps.
mask : np.ndarray[bool]
Array of booleans denoting where values in the original
data are not NA.
Returns
-------
np.ndarray[np.intp]
An array of locations (indices) of the labels from the Index
which correspond to the return values of the `asof` function
for every element in `where`.
- astype(self, dtype, copy: 'bool' = True)
- Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a TypeError exception is raised.
Parameters
----------
dtype : numpy dtype or pandas type
Note that any signed integer `dtype` is treated as ``'int64'``,
and any unsigned integer `dtype` is treated as ``'uint64'``,
regardless of the size.
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
Returns
-------
Index
Index with values cast to specified dtype.
- copy(self: '_IndexT', name: 'Hashable | None' = None, deep: 'bool' = False, dtype: 'Dtype | None' = None, names: 'Sequence[Hashable] | None' = None) -> '_IndexT'
- Make a copy of this object.
Name and dtype sets those attributes on the new object.
Parameters
----------
name : Label, optional
Set name for new object.
deep : bool, default False
dtype : numpy dtype or pandas type, optional
Set dtype for new object.
.. deprecated:: 1.2.0
use ``astype`` method instead.
names : list-like, optional
Kept for compatibility with MultiIndex. Should not be used.
.. deprecated:: 1.4.0
use ``name`` instead.
Returns
-------
Index
Index refer to new object which is a copy of this object.
Notes
-----
In most cases, there should be no functional difference from using
``deep``, but if ``deep`` is passed it will attempt to deepcopy.
- delete(self: '_IndexT', loc) -> '_IndexT'
- Make new Index with passed location(-s) deleted.
Parameters
----------
loc : int or list of int
Location of item(-s) which will be deleted.
Use a list of locations to delete more than one value at the same time.
Returns
-------
Index
Will be same type as self, except for RangeIndex.
See Also
--------
numpy.delete : Delete any rows and column from NumPy array (ndarray).
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete(1)
Index(['a', 'c'], dtype='object')
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete([0, 2])
Index(['b'], dtype='object')
- difference(self, other, sort=None)
- Return a new Index with elements of index not in `other`.
This is the set difference of two Index objects.
Parameters
----------
other : Index or array-like
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
difference : Index
Examples
--------
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
- drop(self, labels, errors: 'str_t' = 'raise') -> 'Index'
- Make new Index with passed list of labels deleted.
Parameters
----------
labels : array-like or scalar
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and existing labels are dropped.
Returns
-------
dropped : Index
Will be same type as self, except for RangeIndex.
Raises
------
KeyError
If not all of the labels are found in the selected axis
- drop_duplicates(self: '_IndexT', keep: 'str_t | bool' = 'first') -> '_IndexT'
- Return Index with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
Returns
-------
deduplicated : Index
See Also
--------
Series.drop_duplicates : Equivalent method on Series.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Index.duplicated : Related method on Index, indicating duplicate
Index values.
Examples
--------
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The `keep` parameter controls which duplicate values are removed.
The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value 'last' keeps the last occurrence for each set of duplicated
entries.
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value ``False`` discards all sets of duplicated entries.
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
- droplevel(self, level=0)
- Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex.
Parameters
----------
level : int, str, or list-like, default 0
If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
Returns
-------
Index or MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.droplevel()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
>>> mi.droplevel(2)
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel('z')
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel(['x', 'y'])
Int64Index([5, 6], dtype='int64', name='z')
- dropna(self: '_IndexT', how: 'str_t' = 'any') -> '_IndexT'
- Return Index without NA/NaN values.
Parameters
----------
how : {'any', 'all'}, default 'any'
If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
Returns
-------
Index
- duplicated(self, keep: "Literal[('first', 'last', False)]" = 'first') -> 'npt.NDArray[np.bool_]'
- Indicate duplicate index values.
Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first, or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
The value or values in a set of duplicates to mark as missing.
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
np.ndarray[bool]
See Also
--------
Series.duplicated : Equivalent method on pandas.Series.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Index.drop_duplicates : Remove duplicate values from Index.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set to False and all others to True:
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first')
array([False, False, True, False, True])
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> idx.duplicated(keep='last')
array([ True, False, True, False, False])
By setting keep on ``False``, all duplicates are True:
>>> idx.duplicated(keep=False)
array([ True, False, True, False, True])
- equals(self, other: 'Any') -> 'bool'
- Determine if two Index object are equal.
The things that are being compared are:
* The elements inside the Index object.
* The order of the elements inside the Index object.
Parameters
----------
other : Any
The other object to compare against.
Returns
-------
bool
True if "other" is an Index and it has the same elements and order
as the calling index; False otherwise.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3])
>>> idx1
Int64Index([1, 2, 3], dtype='int64')
>>> idx1.equals(pd.Index([1, 2, 3]))
True
The elements inside are compared
>>> idx2 = pd.Index(["1", "2", "3"])
>>> idx2
Index(['1', '2', '3'], dtype='object')
>>> idx1.equals(idx2)
False
The order is compared
>>> ascending_idx = pd.Index([1, 2, 3])
>>> ascending_idx
Int64Index([1, 2, 3], dtype='int64')
>>> descending_idx = pd.Index([3, 2, 1])
>>> descending_idx
Int64Index([3, 2, 1], dtype='int64')
>>> ascending_idx.equals(descending_idx)
False
The dtype is *not* compared
>>> int64_idx = pd.Int64Index([1, 2, 3])
>>> int64_idx
Int64Index([1, 2, 3], dtype='int64')
>>> uint64_idx = pd.UInt64Index([1, 2, 3])
>>> uint64_idx
UInt64Index([1, 2, 3], dtype='uint64')
>>> int64_idx.equals(uint64_idx)
True
- fillna(self, value=None, downcast=None)
- Fill NA/NaN values with the specified value.
Parameters
----------
value : scalar
Scalar value to use to fill holes (e.g. 0).
This value cannot be a list-likes.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
Index
See Also
--------
DataFrame.fillna : Fill NaN values of a DataFrame.
Series.fillna : Fill NaN Values of a Series.
- format(self, name: 'bool' = False, formatter: 'Callable | None' = None, na_rep: 'str_t' = 'NaN') -> 'list[str_t]'
- Render a string representation of the Index.
- get_indexer(self, target, method: 'str_t | None' = None, limit: 'int | None' = None, tolerance=None) -> 'npt.NDArray[np.intp]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
Notes
-----
Returns -1 for unmatched values, for further explanation see the
example below.
Examples
--------
>>> index = pd.Index(['c', 'a', 'b'])
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1, 2, -1])
Notice that the return value is an array of locations in ``index``
and ``x`` is marked by -1, as it is not in ``index``.
- get_indexer_for(self, target) -> 'npt.NDArray[np.intp]'
- Guaranteed return of an indexer even when non-unique.
This dispatches to get_indexer or get_indexer_non_unique
as appropriate.
Returns
-------
np.ndarray[np.intp]
List of indices.
Examples
--------
>>> idx = pd.Index([np.nan, 'var1', np.nan])
>>> idx.get_indexer_for([np.nan])
array([0, 2])
- get_indexer_non_unique(self, target) -> 'tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
missing : np.ndarray[np.intp]
An indexer into the target of the values not found.
These correspond to the -1 in the indexer array.
- get_level_values = _get_level_values(self, level) -> 'Index'
- get_loc(self, key, method=None, tolerance=None)
- Get integer location, slice or boolean mask for requested label.
Parameters
----------
key : label
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
tolerance : int or float, optional
Maximum distance from index value for inexact matches. The value of
the index at the matching location must satisfy the equation
``abs(index[loc] - key) <= tolerance``.
Returns
-------
loc : int if unique index, slice if monotonic index, else mask
Examples
--------
>>> unique_index = pd.Index(list('abc'))
>>> unique_index.get_loc('b')
1
>>> monotonic_index = pd.Index(list('abbc'))
>>> monotonic_index.get_loc('b')
slice(1, 3, None)
>>> non_monotonic_index = pd.Index(list('abcb'))
>>> non_monotonic_index.get_loc('b')
array([False, True, False, True])
- get_slice_bound(self, label, side: "Literal[('left', 'right')]", kind=<no_default>) -> 'int'
- Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position
of given label.
Parameters
----------
label : object
side : {'left', 'right'}
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
int
Index of label.
- get_value(self, series: 'Series', key)
- Fast lookup of value from 1-dimensional ndarray.
Only use this if you know what you're doing.
Returns
-------
scalar or Series
- groupby(self, values) -> 'PrettyDict[Hashable, np.ndarray]'
- Group the index labels by a given array of values.
Parameters
----------
values : array
Values used to determine the groups.
Returns
-------
dict
{group name -> group labels}
- holds_integer(self) -> 'bool'
- Whether the type is an integer type.
- identical(self, other) -> 'bool'
- Similar to equals, but checks that object attributes and types are also equal.
Returns
-------
bool
If two Index objects have equal elements and same type True,
otherwise False.
- insert(self, loc: 'int', item) -> 'Index'
- Make new Index inserting new item at location.
Follows Python numpy.insert semantics for negative values.
Parameters
----------
loc : int
item : object
Returns
-------
new_index : Index
- intersection(self, other, sort=False)
- Form the intersection of two Index objects.
This returns a new Index with elements common to the index and `other`.
Parameters
----------
other : Index or array-like
sort : False or None, default False
Whether to sort the resulting index.
* False : do not sort the result.
* None : sort the result, except when `self` and `other` are equal
or when the values cannot be compared.
Returns
-------
intersection : Index
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
- is_(self, other) -> 'bool'
- More flexible, faster check like ``is`` but that works through views.
Note: this is *not* the same as ``Index.identical()``, which checks
that metadata is also the same.
Parameters
----------
other : object
Other object to compare against.
Returns
-------
bool
True if both have same underlying data, False otherwise.
See Also
--------
Index.identical : Works like ``Index.is_`` but also checks metadata.
- is_boolean(self) -> 'bool'
- Check if the Index only consists of booleans.
Returns
-------
bool
Whether or not the Index only consists of booleans.
See Also
--------
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
- is_categorical(self) -> 'bool'
- Check if the Index holds categorical data.
Returns
-------
bool
True if the Index is categorical.
See Also
--------
CategoricalIndex : Index for categorical data.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = pd.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0 Peter
1 Victor
2 Elisabeth
3 Mar
dtype: object
>>> s.index.is_categorical()
False
- is_floating(self) -> 'bool'
- Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats,
integers, or NaNs.
Returns
-------
bool
Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
- is_integer(self) -> 'bool'
- Check if the Index only consists of integers.
Returns
-------
bool
Whether or not the Index only consists of integers.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
- is_interval(self) -> 'bool'
- Check if the Index holds Interval objects.
Returns
-------
bool
Whether or not the Index holds Interval objects.
See Also
--------
IntervalIndex : Index for Interval objects.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
- is_mixed(self) -> 'bool'
- Check if the Index holds data with mixed data types.
Returns
-------
bool
Whether or not the Index holds data with mixed data types.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
- is_numeric(self) -> 'bool'
- Check if the Index only consists of numeric data.
Returns
-------
bool
Whether or not the Index only consists of numeric data.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
- is_object(self) -> 'bool'
- Check if the Index is of the object dtype.
Returns
-------
bool
Whether or not the Index is of the object dtype.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
- is_type_compatible(self, kind: 'str_t') -> 'bool'
- Whether the index type is compatible with the provided type.
- isin(self, values, level=None) -> 'np.ndarray'
- Return a boolean array where the index values are in `values`.
Compute boolean array of whether each index value is found in the
passed set of values. The length of the returned boolean array matches
the length of the index.
Parameters
----------
values : set or list-like
Sought values.
level : str or int, optional
Name or position of the index level to use (if the index is a
`MultiIndex`).
Returns
-------
np.ndarray[bool]
NumPy array of boolean values.
See Also
--------
Series.isin : Same for Series.
DataFrame.isin : Same method for DataFrames.
Notes
-----
In the case of `MultiIndex` you must either specify `values` as a
list-like object containing tuples that are the same length as the
number of levels, or specify `level`. Otherwise it will raise a
``ValueError``.
If `level` is specified:
- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.
Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4])
array([ True, False, False])
>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red', 'blue', 'green']],
... names=('number', 'color'))
>>> midx
MultiIndex([(1, 'red'),
(2, 'blue'),
(3, 'green')],
names=['number', 'color'])
Check whether the strings in the 'color' level of the MultiIndex
are in a list of colors.
>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])
To check across the levels of a MultiIndex, pass a list of tuples:
>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
For a DatetimeIndex, string values in `values` are converted to
Timestamps.
>>> dates = ['2000-03-11', '2000-03-12', '2000-03-13']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],
dtype='datetime64[ns]', freq=None)
>>> dti.isin(['2000-03-11'])
array([ True, False, False])
- isna(self) -> 'npt.NDArray[np.bool_]'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`pd.NaT`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values. Characters such as
empty strings `''` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
numpy.ndarray[bool]
A boolean array of whether my values are NA.
See Also
--------
Index.notna : Boolean inverse of isna.
Index.dropna : Omit entries with missing values.
isna : Top-level isna.
Series.isna : Detect missing values in Series object.
Examples
--------
Show which entries in a pandas.Index are NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.isna()
array([False, False, True])
Empty strings are not considered NA values. None is considered an NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.isna()
array([False, False, False, True])
For datetimes, `NaT` (Not a Time) is considered as an NA value.
>>> idx = pd.DatetimeIndex([pd.Timestamp('1940-04-25'),
... pd.Timestamp(''), None, pd.NaT])
>>> idx
DatetimeIndex(['1940-04-25', 'NaT', 'NaT', 'NaT'],
dtype='datetime64[ns]', freq=None)
>>> idx.isna()
array([False, True, True, True])
- isnull = isna(self) -> 'npt.NDArray[np.bool_]'
- join(self, other, how: 'str_t' = 'left', level=None, return_indexers: 'bool' = False, sort: 'bool' = False)
- Compute join_index and indexers to conform data
structures to the new index.
Parameters
----------
other : Index
how : {'left', 'right', 'inner', 'outer'}
level : int or level name, default None
return_indexers : bool, default False
sort : bool, default False
Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword).
Returns
-------
join_index, (left_indexer, right_indexer)
- map(self, mapper, na_action=None)
- Map values using an input mapping or function.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence.
Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.
- max(self, axis=None, skipna=True, *args, **kwargs)
- Return the maximum value of the Index.
Parameters
----------
axis : int, optional
For compatibility with NumPy. Only 0 or None are allowed.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Maximum value.
See Also
--------
Index.min : Return the minimum value in an Index.
Series.max : Return the maximum value in a Series.
DataFrame.max : Return the maximum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.max()
3
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.max()
'c'
For a MultiIndex, the maximum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.max()
('b', 2)
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of the values.
Parameters
----------
deep : bool, default False
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption.
Returns
-------
bytes used
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False or if used on PyPy
- min(self, axis=None, skipna=True, *args, **kwargs)
- Return the minimum value of the Index.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Minimum value.
See Also
--------
Index.max : Return the maximum value of the object.
Series.min : Return the minimum value in a Series.
DataFrame.min : Return the minimum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.min()
1
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.min()
'a'
For a MultiIndex, the minimum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.min()
('a', 1)
- notna(self) -> 'npt.NDArray[np.bool_]'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.
Returns
-------
numpy.ndarray[bool]
Boolean array to indicate which entries are not NA.
See Also
--------
Index.notnull : Alias of notna.
Index.isna: Inverse of notna.
notna : Top-level notna.
Examples
--------
Show which entries in an Index are not NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.notna()
array([ True, True, False])
Empty strings are not considered NA values. None is considered a NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.notna()
array([ True, True, True, False])
- notnull = notna(self) -> 'npt.NDArray[np.bool_]'
- putmask(self, mask, value) -> 'Index'
- Return a new Index of the values set with the mask.
Returns
-------
Index
See Also
--------
numpy.ndarray.putmask : Changes elements of an array
based on conditional and input values.
- ravel(self, order='C')
- Return an ndarray of the flattened values of the underlying data.
Returns
-------
numpy.ndarray
Flattened array.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> 'tuple[Index, npt.NDArray[np.intp] | None]'
- Create index with target's values.
Parameters
----------
target : an iterable
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
level : int, optional
Level of multiindex.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : int or float, optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
new_index : pd.Index
Resulting index.
indexer : np.ndarray[np.intp] or None
Indices of output values in original index.
Raises
------
TypeError
If ``method`` passed along with ``level``.
ValueError
If non-unique multi-index
ValueError
If non-unique index and ``method`` or ``limit`` passed.
See Also
--------
Series.reindex
DataFrame.reindex
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.reindex(['car', 'bike'])
(Index(['car', 'bike'], dtype='object'), array([0, 1]))
- rename(self, name, inplace=False)
- Alter Index or MultiIndex name.
Able to set new names without level. Defaults to returning new index.
Length of names must match number of levels in MultiIndex.
Parameters
----------
name : label or list of labels
Name(s) to set.
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.set_names : Able to set new names partially and by level.
Examples
--------
>>> idx = pd.Index(['A', 'C', 'A', 'B'], name='score')
>>> idx.rename('grade')
Index(['A', 'C', 'A', 'B'], dtype='object', name='grade')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]],
... names=['kind', 'year'])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.rename(['species', 'year'])
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
>>> idx.rename('species')
Traceback (most recent call last):
TypeError: Must pass list-like as `names`.
- repeat(self, repeats, axis=None)
- Repeat elements of a Index.
Returns a new Index where each element of the current Index
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Index.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
repeated_index : Index
Newly created Index with repeated elements.
See Also
--------
Series.repeat : Equivalent function for Series.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2)
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3])
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
- set_names(self, names, level=None, inplace: 'bool' = False)
- Set Index or MultiIndex name.
Able to set new names partially and by level.
Parameters
----------
names : label or list of label or dict-like for MultiIndex
Name(s) to set.
.. versionchanged:: 1.3.0
level : int, label or list of int or label, optional
If the index is a MultiIndex and names is not dict-like, level(s) to set
(None for all levels). Otherwise level must be None.
.. versionchanged:: 1.3.0
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.rename : Able to set new names without level.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
)
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.set_names('species', level=0)
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
When renaming levels with a dict, levels can not be passed.
>>> idx.set_names({'kind': 'snake'})
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['snake', 'year'])
- set_value(self, arr, key, value)
- Fast lookup of value from 1-dimensional ndarray.
.. deprecated:: 1.0
Notes
-----
Only use this if you know what you're doing.
- shift(self, periods=1, freq=None)
- Shift index by desired number of time frequency increments.
This method is for shifting the values of datetime-like indexes
by a specified time increment a given number of times.
Parameters
----------
periods : int, default 1
Number of periods (or increments) to shift by,
can be positive or negative.
freq : pandas.DateOffset, pandas.Timedelta or str, optional
Frequency increment to shift by.
If None, the index is shifted by its own `freq` attribute.
Offset aliases are valid strings, e.g., 'D', 'W', 'M' etc.
Returns
-------
pandas.Index
Shifted index.
See Also
--------
Series.shift : Shift values of Series.
Notes
-----
This method is only implemented for datetime-like index classes,
i.e., DatetimeIndex, PeriodIndex and TimedeltaIndex.
Examples
--------
Put the first 5 month starts of 2011 into an index.
>>> month_starts = pd.date_range('1/1/2011', periods=5, freq='MS')
>>> month_starts
DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01', '2011-04-01',
'2011-05-01'],
dtype='datetime64[ns]', freq='MS')
Shift the index by 10 days.
>>> month_starts.shift(10, freq='D')
DatetimeIndex(['2011-01-11', '2011-02-11', '2011-03-11', '2011-04-11',
'2011-05-11'],
dtype='datetime64[ns]', freq=None)
The default value of `freq` is the `freq` attribute of the index,
which is 'MS' (month start) in this example.
>>> month_starts.shift(10)
DatetimeIndex(['2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
'2012-03-01'],
dtype='datetime64[ns]', freq='MS')
- slice_indexer(self, start: 'Hashable | None' = None, end: 'Hashable | None' = None, step: 'int | None' = None, kind=<no_default>) -> 'slice'
- Compute the slice indexer for input labels and step.
Index needs to be ordered and unique.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, default None
kind : str, default None
.. deprecated:: 1.4.0
Returns
-------
indexer : slice
Raises
------
KeyError : If key does not exist, or key is not unique and index is
not ordered.
Notes
-----
This function assumes that the data is sorted, so use at your own peril
Examples
--------
This is a method on all index types. For example you can do:
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_indexer(start='b', end='c')
slice(1, 3, None)
>>> idx = pd.MultiIndex.from_arrays([list('abcd'), list('efgh')])
>>> idx.slice_indexer(start='b', end=('c', 'g'))
slice(1, 3, None)
- slice_locs(self, start=None, end=None, step=None, kind=<no_default>) -> 'tuple[int, int]'
- Compute slice locations for input labels.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, defaults None
If None, defaults to 1.
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
start, end : int
See Also
--------
Index.get_loc : Get location for a single label.
Notes
-----
This method only works if the index is monotonic or unique.
Examples
--------
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_locs(start='b', end='c')
(1, 3)
- sort(self, *args, **kwargs)
- Use sort_values instead.
- sort_values(self, return_indexer: 'bool' = False, ascending: 'bool' = True, na_position: 'str_t' = 'last', key: 'Callable | None' = None)
- Return a sorted copy of the index.
Return a sorted copy of the index, and optionally return the indices
that sorted the index itself.
Parameters
----------
return_indexer : bool, default False
Should the indices that would sort the index be returned.
ascending : bool, default True
Should the index values be sorted in an ascending order.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
.. versionadded:: 1.2.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
sorted_index : pandas.Index
Sorted copy of the index.
indexer : numpy.ndarray, optional
The indices that the index itself was sorted by.
See Also
--------
Series.sort_values : Sort values of a Series.
DataFrame.sort_values : Sort values in a DataFrame.
Examples
--------
>>> idx = pd.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices `idx` was
sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
- sortlevel(self, level=None, ascending=True, sort_remaining=None)
- For internal compatibility with the Index API.
Sort the Index. This is for compat with MultiIndex
Parameters
----------
ascending : bool, default True
False to sort in descending order
level, sort_remaining are compat parameters
Returns
-------
Index
- symmetric_difference(self, other, result_name=None, sort=None)
- Compute the symmetric difference of two Index objects.
Parameters
----------
other : Index or array-like
result_name : str
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
symmetric_difference : Index
Notes
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([2, 3, 4, 5])
>>> idx1.symmetric_difference(idx2)
Int64Index([1, 5], dtype='int64')
- take(self, indices, axis: 'int' = 0, allow_fill: 'bool' = True, fill_value=None, **kwargs)
- Return a new Index of the values selected by the indices.
For internal compatibility with numpy arrays.
Parameters
----------
indices : array-like
Indices to be taken.
axis : int, optional
The axis over which to select values, always 0.
allow_fill : bool, default True
fill_value : scalar, default None
If allow_fill=True and fill_value is not None, indices specified by
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
Returns
-------
Index
An index formed of elements at the given indices. Will be the same
type as self, except for RangeIndex.
See Also
--------
numpy.ndarray.take: Return an array formed from the
elements of a at the given indices.
- to_flat_index(self)
- Identity method.
This is implemented for compatibility with subclass implementations
when chaining.
Returns
-------
pd.Index
Caller.
See Also
--------
MultiIndex.to_flat_index : Subclass implementation.
- to_frame(self, index: 'bool' = True, name: 'Hashable' = <no_default>) -> 'DataFrame'
- Create a DataFrame with a column containing the Index.
Parameters
----------
index : bool, default True
Set the index of the returned DataFrame as the original Index.
name : object, default None
The passed name should substitute for the index name (if it has
one).
Returns
-------
DataFrame
DataFrame containing the original Index data.
See Also
--------
Index.to_series : Convert an Index to a Series.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
animal
animal
Ant Ant
Bear Bear
Cow Cow
By default, the original Index is reused. To enforce a new Index:
>>> idx.to_frame(index=False)
animal
0 Ant
1 Bear
2 Cow
To override the name of the resulting column, specify `name`:
>>> idx.to_frame(index=False, name='zoo')
zoo
0 Ant
1 Bear
2 Cow
- to_native_types(self, slicer=None, **kwargs) -> 'np.ndarray'
- Format specified values of `self` and return them.
.. deprecated:: 1.2.0
Parameters
----------
slicer : int, array-like
An indexer into `self` that specifies which values
are used in the formatting process.
kwargs : dict
Options for specifying how the values should be formatted.
These options include the following:
1) na_rep : str
The value that serves as a placeholder for NULL values
2) quoting : bool or None
Whether or not there are quoted values in `self`
3) date_format : str
The format used to represent date-like values.
Returns
-------
numpy.ndarray
Formatted values.
- to_series(self, index=None, name: 'Hashable' = None) -> 'Series'
- Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.
Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.
Returns
-------
Series
The dtype will be based on the type of the Index values.
See Also
--------
Index.to_frame : Convert an Index to a DataFrame.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
By default, the original Index and original name is reused.
>>> idx.to_series()
animal
Ant Ant
Bear Bear
Cow Cow
Name: animal, dtype: object
To enforce a new Index, specify new labels to ``index``:
>>> idx.to_series(index=[0, 1, 2])
0 Ant
1 Bear
2 Cow
Name: animal, dtype: object
To override the name of the resulting column, specify `name`:
>>> idx.to_series(name='zoo')
animal
Ant Ant
Bear Bear
Cow Cow
Name: zoo, dtype: object
- union(self, other, sort=None)
- Form the union of two Index objects.
If the Index objects are incompatible, both Index objects will be
cast to dtype('object') first.
.. versionchanged:: 0.25.0
Parameters
----------
other : Index or array-like
sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.
* False : do not sort the result.
Returns
-------
union : Index
Examples
--------
Union matching dtypes
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union mismatched dtypes
>>> idx1 = pd.Index(['a', 'b', 'c', 'd'])
>>> idx2 = pd.Index([1, 2, 3, 4])
>>> idx1.union(idx2)
Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
MultiIndex case
>>> idx1 = pd.MultiIndex.from_arrays(
... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
... )
>>> idx1
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue')],
)
>>> idx2 = pd.MultiIndex.from_arrays(
... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
... )
>>> idx2
MultiIndex([(3, 'Red'),
(3, 'Green'),
(2, 'Red'),
(2, 'Green')],
)
>>> idx1.union(idx2)
MultiIndex([(1, 'Blue'),
(1, 'Red'),
(2, 'Blue'),
(2, 'Green'),
(2, 'Red'),
(3, 'Green'),
(3, 'Red')],
)
>>> idx1.union(idx2, sort=False)
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue'),
(3, 'Red'),
(3, 'Green'),
(2, 'Green')],
)
- unique(self: '_IndexT', level: 'Hashable | None' = None) -> '_IndexT'
- Return unique values in the index.
Unique values are returned in order of appearance, this does NOT sort.
Parameters
----------
level : int or hashable, optional
Only return values from specified level (for MultiIndex).
If int, gets the level by integer position, else by level name.
Returns
-------
Index
See Also
--------
unique : Numpy array of unique values in that column.
Series.unique : Return unique values of Series object.
- view(self, cls=None)
- where(self, cond, other=None) -> 'Index'
- Replace values where the condition is False.
The replacement is taken from other.
Parameters
----------
cond : bool array-like with the same length as self
Condition to select the values on.
other : scalar, or array-like, default None
Replacement if the condition is False.
Returns
-------
pandas.Index
A copy of self with values replaced from other
where the condition is False.
See Also
--------
Series.where : Same method for Series.
DataFrame.where : Same method for DataFrame.
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.where(idx.isin(['car', 'train']), 'other')
Index(['car', 'other', 'train', 'other'], dtype='object')
Static methods defined here:
- __new__(cls, data=None, dtype=None, copy=False, name=None, tupleize_cols=True, **kwargs) -> 'Index'
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- asi8
- Integer representation of the values.
Returns
-------
ndarray
An ndarray with int64 dtype.
- has_duplicates
- Check if the Index has duplicate values.
Returns
-------
bool
Whether or not the Index has duplicate values.
Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
- is_monotonic
- Alias for is_monotonic_increasing.
- is_monotonic_decreasing
- Return if the index is monotonic decreasing (only equal or
decreasing) values.
Examples
--------
>>> Index([3, 2, 1]).is_monotonic_decreasing
True
>>> Index([3, 2, 2]).is_monotonic_decreasing
True
>>> Index([3, 1, 2]).is_monotonic_decreasing
False
- is_monotonic_increasing
- Return if the index is monotonic increasing (only equal or
increasing) values.
Examples
--------
>>> Index([1, 2, 3]).is_monotonic_increasing
True
>>> Index([1, 2, 2]).is_monotonic_increasing
True
>>> Index([1, 3, 2]).is_monotonic_increasing
False
- nlevels
- Number of levels.
- shape
- Return a tuple of the shape of the underlying data.
- values
- Return an array representing the data in the Index.
.. warning::
We recommend using :attr:`Index.array` or
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
array: numpy.ndarray or ExtensionArray
See Also
--------
Index.array : Reference to the underlying data.
Index.to_numpy : A NumPy array representing the underlying data.
Data descriptors defined here:
- array
- The ExtensionArray of the data backing this Series or Index.
Returns
-------
ExtensionArray
An ExtensionArray of the values stored within. For extension
types, this is the actual array. For NumPy native types, this
is a thin (no copy) wrapper around :class:`numpy.ndarray`.
``.array`` differs ``.values`` which may require converting the
data to a different form.
See Also
--------
Index.to_numpy : Similar method that always returns a NumPy array.
Series.to_numpy : Similar method that always returns a NumPy array.
Notes
-----
This table lays out the different array types for each extension
dtype within pandas.
================== =============================
dtype array type
================== =============================
category Categorical
period PeriodArray
interval IntervalArray
IntegerNA IntegerArray
string StringArray
boolean BooleanArray
datetime64[ns, tz] DatetimeArray
================== =============================
For any 3rd-party extension types, the array type will be an
ExtensionArray.
For all remaining dtypes ``.array`` will be a
:class:`arrays.NumpyExtensionArray` wrapping the actual ndarray
stored within. If you absolutely need a NumPy array (possibly with
copying / coercing data), then use :meth:`Series.to_numpy` instead.
Examples
--------
For regular NumPy types like int, and float, a PandasArray
is returned.
>>> pd.Series([1, 2, 3]).array
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray
is returned
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
- dtype
- Return the dtype object of the underlying data.
- hasnans
- Return True if there are any NaNs.
Enables various performance speedups.
- inferred_type
- Return a string of the type inferred from the values.
- is_all_dates
- Whether or not the index values only consist of dates.
- is_unique
- Return if the index has unique values.
- name
- Return Index or MultiIndex name.
- names
Data and other attributes defined here:
- __annotations__ = {'__hash__': 'None', '_attributes': 'list[str]', '_can_hold_na': 'bool', '_can_hold_strings': 'bool', '_comparables': 'list[str]', '_data': 'ExtensionArray | np.ndarray', '_data_cls': 'type[ExtensionArray] | tuple[type[np.ndarray], type[ExtensionArray]]', '_engine_type': 'type[libindex.IndexEngine]', '_hidden_attrs': 'frozenset[str]', '_id': 'object | None', ...}
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1)
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted Index `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
.. note::
The Index *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.
Parameters
----------
value : array-like or scalar
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort `self` into ascending
order. They are typically the result of ``np.argsort``.
Returns
-------
int or array of int
A scalar or array of insertion points with the
same shape as `value`.
See Also
--------
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.
Notes
-----
Binary search is used to find the required insertion points.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> ser
0 1
1 2
2 3
dtype: int64
>>> ser.searchsorted(4)
3
>>> ser.searchsorted([0, 4])
array([0, 3])
>>> ser.searchsorted([1, 3], side='left')
array([0, 2])
>>> ser.searchsorted([1, 3], side='right')
array([1, 3])
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
>>> ser.searchsorted('3/14/2000')
3
>>> ser = pd.Categorical(
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']
>>> ser.searchsorted('bread')
1
>>> ser.searchsorted(['bread'], side='right')
array([3])
If the values are not monotonically sorted, wrong locations
may be returned:
>>> ser = pd.Series([2, 1, 3])
>>> ser
0 2
1 1
2 3
dtype: int64
>>> ser.searchsorted(1) # doctest: +SKIP
0 # wrong result, correct would be 1
- to_list = tolist(self)
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- nbytes
- Return the number of bytes in the underlying data.
- ndim
- Number of dimensions of the underlying data, by definition 1.
- size
- Return the number of elements in the underlying data.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __divmod__(self, other)
- __eq__(self, other)
- Return self==value.
- __floordiv__(self, other)
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rdivmod__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class Int16Dtype(_IntegerDtype) |
| |
An ExtensionDtype for int16 integer data.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as its missing value,
rather than :attr:`numpy.nan`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- Int16Dtype
- _IntegerDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'Int16'
- type = <class 'numpy.int16'>
- Signed integer type, compatible with C ``short``.
:Character code: ``'h'``
:Canonical name: `numpy.short`
:Alias on this platform (Windows AMD64): `numpy.int16`: 16-bit signed integer (``-32_768`` to ``32_767``).
Methods inherited from _IntegerDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from _IntegerDtype:
- construct_array_type() -> 'type[IntegerArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Data descriptors inherited from _IntegerDtype:
- is_signed_integer
- is_unsigned_integer
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Int32Dtype(_IntegerDtype) |
| |
An ExtensionDtype for int32 integer data.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as its missing value,
rather than :attr:`numpy.nan`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- Int32Dtype
- _IntegerDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'Int32'
- type = <class 'numpy.int32'>
- Signed integer type, compatible with Python `int` and C ``long``.
:Character code: ``'l'``
:Canonical name: `numpy.int_`
:Alias on this platform (Windows AMD64): `numpy.int32`: 32-bit signed integer (``-2_147_483_648`` to ``2_147_483_647``).
Methods inherited from _IntegerDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from _IntegerDtype:
- construct_array_type() -> 'type[IntegerArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Data descriptors inherited from _IntegerDtype:
- is_signed_integer
- is_unsigned_integer
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Int64Dtype(_IntegerDtype) |
| |
An ExtensionDtype for int64 integer data.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as its missing value,
rather than :attr:`numpy.nan`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- Int64Dtype
- _IntegerDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'Int64'
- type = <class 'numpy.int64'>
- Signed integer type, compatible with C ``long long``.
:Character code: ``'q'``
:Canonical name: `numpy.longlong`
:Alias on this platform (Windows AMD64): `numpy.int64`: 64-bit signed integer (``-9_223_372_036_854_775_808`` to ``9_223_372_036_854_775_807``).
:Alias on this platform (Windows AMD64): `numpy.intp`: Signed integer large enough to fit pointer, compatible with C ``intptr_t``.
Methods inherited from _IntegerDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from _IntegerDtype:
- construct_array_type() -> 'type[IntegerArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Data descriptors inherited from _IntegerDtype:
- is_signed_integer
- is_unsigned_integer
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Int8Dtype(_IntegerDtype) |
| |
An ExtensionDtype for int8 integer data.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as its missing value,
rather than :attr:`numpy.nan`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- Int8Dtype
- _IntegerDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'Int8'
- type = <class 'numpy.int8'>
- Signed integer type, compatible with C ``char``.
:Character code: ``'b'``
:Canonical name: `numpy.byte`
:Alias on this platform (Windows AMD64): `numpy.int8`: 8-bit signed integer (``-128`` to ``127``).
Methods inherited from _IntegerDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from _IntegerDtype:
- construct_array_type() -> 'type[IntegerArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Data descriptors inherited from _IntegerDtype:
- is_signed_integer
- is_unsigned_integer
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class Interval(IntervalMixin) |
| |
Immutable object implementing an Interval, a bounded slice-like interval.
Parameters
----------
left : orderable scalar
Left bound for the interval.
right : orderable scalar
Right bound for the interval.
closed : {'right', 'left', 'both', 'neither'}, default 'right'
Whether the interval is closed on the left-side, right-side, both or
neither. See the Notes for more detailed explanation.
See Also
--------
IntervalIndex : An Index of Interval objects that are all closed on the
same side.
cut : Convert continuous data into discrete bins (Categorical
of Interval objects).
qcut : Convert continuous data into bins (Categorical of Interval objects)
based on quantiles.
Period : Represents a period of time.
Notes
-----
The parameters `left` and `right` must be from the same type, you must be
able to compare them and they must satisfy ``left <= right``.
A closed interval (in mathematics denoted by square brackets) contains
its endpoints, i.e. the closed interval ``[0, 5]`` is characterized by the
conditions ``0 <= x <= 5``. This is what ``closed='both'`` stands for.
An open interval (in mathematics denoted by parentheses) does not contain
its endpoints, i.e. the open interval ``(0, 5)`` is characterized by the
conditions ``0 < x < 5``. This is what ``closed='neither'`` stands for.
Intervals can also be half-open or half-closed, i.e. ``[0, 5)`` is
described by ``0 <= x < 5`` (``closed='left'``) and ``(0, 5]`` is
described by ``0 < x <= 5`` (``closed='right'``).
Examples
--------
It is possible to build Intervals of different types, like numeric ones:
>>> iv = pd.Interval(left=0, right=5)
>>> iv
Interval(0, 5, closed='right')
You can check if an element belongs to it
>>> 2.5 in iv
True
You can test the bounds (``closed='right'``, so ``0 < x <= 5``):
>>> 0 in iv
False
>>> 5 in iv
True
>>> 0.0001 in iv
True
Calculate its length
>>> iv.length
5
You can operate with `+` and `*` over an Interval and the operation
is applied to each of its bounds, so the result depends on the type
of the bound elements
>>> shifted_iv = iv + 3
>>> shifted_iv
Interval(3, 8, closed='right')
>>> extended_iv = iv * 10.0
>>> extended_iv
Interval(0.0, 50.0, closed='right')
To create a time interval you can use Timestamps as the bounds
>>> year_2017 = pd.Interval(pd.Timestamp('2017-01-01 00:00:00'),
... pd.Timestamp('2018-01-01 00:00:00'),
... closed='left')
>>> pd.Timestamp('2017-01-01 00:00') in year_2017
True
>>> year_2017.length
Timedelta('365 days 00:00:00') |
| |
- Method resolution order:
- Interval
- IntervalMixin
- builtins.object
Methods defined here:
- __add__(self, value, /)
- Return self+value.
- __contains__(self, key, /)
- Return key in self.
- __eq__(self, value, /)
- Return self==value.
- __floordiv__(self, value, /)
- Return self//value.
- __ge__(self, value, /)
- Return self>=value.
- __gt__(self, value, /)
- Return self>value.
- __hash__(self, /)
- Return hash(self).
- __init__(self, /, *args, **kwargs)
- Initialize self. See help(type(self)) for accurate signature.
- __le__(self, value, /)
- Return self<=value.
- __lt__(self, value, /)
- Return self<value.
- __mul__(self, value, /)
- Return self*value.
- __ne__(self, value, /)
- Return self!=value.
- __radd__(self, value, /)
- Return value+self.
- __reduce__(...)
- Helper for pickle.
- __repr__(self, /)
- Return repr(self).
- __rfloordiv__(self, value, /)
- Return value//self.
- __rmul__(self, value, /)
- Return value*self.
- __rsub__(self, value, /)
- Return value-self.
- __rtruediv__(self, value, /)
- Return value/self.
- __str__(self, /)
- Return str(self).
- __sub__(self, value, /)
- Return self-value.
- __truediv__(self, value, /)
- Return self/value.
- overlaps(...)
- Check whether two Interval objects overlap.
Two intervals overlap if they share a common point, including closed
endpoints. Intervals that only have an open endpoint in common do not
overlap.
Parameters
----------
other : Interval
Interval to check against for an overlap.
Returns
-------
bool
True if the two intervals overlap.
See Also
--------
IntervalArray.overlaps : The corresponding method for IntervalArray.
IntervalIndex.overlaps : The corresponding method for IntervalIndex.
Examples
--------
>>> i1 = pd.Interval(0, 2)
>>> i2 = pd.Interval(1, 3)
>>> i1.overlaps(i2)
True
>>> i3 = pd.Interval(4, 5)
>>> i1.overlaps(i3)
False
Intervals that share closed endpoints overlap:
>>> i4 = pd.Interval(0, 1, closed='both')
>>> i5 = pd.Interval(1, 2, closed='both')
>>> i4.overlaps(i5)
True
Intervals that only have an open endpoint in common do not overlap:
>>> i6 = pd.Interval(1, 2, closed='neither')
>>> i4.overlaps(i6)
False
Static methods defined here:
- __new__(*args, **kwargs) from builtins.type
- Create and return a new object. See help(type) for accurate signature.
Data descriptors defined here:
- closed
- Whether the interval is closed on the left-side, right-side, both or
neither.
- left
- Left bound for the interval.
- right
- Right bound for the interval.
Data and other attributes defined here:
- __array_priority__ = 1000
Methods inherited from IntervalMixin:
- __setstate__ = __setstate_cython__(...)
Data descriptors inherited from IntervalMixin:
- closed_left
- Check if the interval is closed on the left side.
For the meaning of `closed` and `open` see :class:`~pandas.Interval`.
Returns
-------
bool
True if the Interval is closed on the left-side.
- closed_right
- Check if the interval is closed on the right side.
For the meaning of `closed` and `open` see :class:`~pandas.Interval`.
Returns
-------
bool
True if the Interval is closed on the left-side.
- is_empty
- Indicates if an interval is empty, meaning it contains no points.
.. versionadded:: 0.25.0
Returns
-------
bool or ndarray
A boolean indicating if a scalar :class:`Interval` is empty, or a
boolean ``ndarray`` positionally indicating if an ``Interval`` in
an :class:`~arrays.IntervalArray` or :class:`IntervalIndex` is
empty.
Examples
--------
An :class:`Interval` that contains points is not empty:
>>> pd.Interval(0, 1, closed='right').is_empty
False
An ``Interval`` that does not contain any points is empty:
>>> pd.Interval(0, 0, closed='right').is_empty
True
>>> pd.Interval(0, 0, closed='left').is_empty
True
>>> pd.Interval(0, 0, closed='neither').is_empty
True
An ``Interval`` that contains a single point is not empty:
>>> pd.Interval(0, 0, closed='both').is_empty
False
An :class:`~arrays.IntervalArray` or :class:`IntervalIndex` returns a
boolean ``ndarray`` positionally indicating if an ``Interval`` is
empty:
>>> ivs = [pd.Interval(0, 0, closed='neither'),
... pd.Interval(1, 2, closed='neither')]
>>> pd.arrays.IntervalArray(ivs).is_empty
array([ True, False])
Missing values are not considered empty:
>>> ivs = [pd.Interval(0, 0, closed='neither'), np.nan]
>>> pd.IntervalIndex(ivs).is_empty
array([ True, False])
- length
- Return the length of the Interval.
- mid
- Return the midpoint of the Interval.
- open_left
- Check if the interval is open on the left side.
For the meaning of `closed` and `open` see :class:`~pandas.Interval`.
Returns
-------
bool
True if the Interval is closed on the left-side.
- open_right
- Check if the interval is open on the right side.
For the meaning of `closed` and `open` see :class:`~pandas.Interval`.
Returns
-------
bool
True if the Interval is closed on the left-side.
|
class IntervalDtype(PandasExtensionDtype) |
| |
IntervalDtype(subtype=None, closed: 'str_type | None' = None)
An ExtensionDtype for Interval data.
**This is not an actual numpy dtype**, but a duck type.
Parameters
----------
subtype : str, np.dtype
The dtype of the Interval bounds.
Attributes
----------
subtype
Methods
-------
None
Examples
--------
>>> pd.IntervalDtype(subtype='int64', closed='both')
interval[int64, both] |
| |
- Method resolution order:
- IntervalDtype
- PandasExtensionDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Methods defined here:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'IntervalArray'
- Construct IntervalArray from pyarrow Array/ChunkedArray.
- __hash__(self) -> 'int'
- Return hash(self).
- __setstate__(self, state)
- __str__(self) -> 'str_type'
- Return str(self).
Class methods defined here:
- construct_array_type() -> 'type[IntervalArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
- construct_from_string(string: 'str_type') -> 'IntervalDtype' from builtins.type
- attempt to construct this type from a string, raise a TypeError
if its not possible
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Return a boolean if we if the passed type is an actual dtype that we
can match (via string or type)
Static methods defined here:
- __new__(cls, subtype=None, closed: 'str_type | None' = None)
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- closed
- subtype
- The dtype of the Interval bounds.
- type
- The scalar type for the array, e.g. ``int``
It's expected ``ExtensionArray[item]`` returns an instance
of ``ExtensionDtype.type`` for scalar ``item``, assuming
that value is valid (not NA). NA values do not need to be
instances of `type`.
Data and other attributes defined here:
- __annotations__ = {'_cache_dtypes': 'dict[str_type, PandasExtensionDtype]', 'kind': 'str_type'}
- base = dtype('O')
- kind = 'O'
- name = 'interval'
- num = 103
- str = '|O08'
Methods inherited from PandasExtensionDtype:
- __getstate__(self) -> 'dict[str_type, Any]'
- __repr__(self) -> 'str_type'
- Return a string representation for a particular object.
Class methods inherited from PandasExtensionDtype:
- reset_cache() -> 'None' from builtins.type
- clear the cache
Data and other attributes inherited from PandasExtensionDtype:
- isbuiltin = 0
- isnative = 0
- itemsize = 8
- shape = ()
- subdtype = None
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- na_value
- Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the
user-facing "boxed" version of the NA value, not the physical NA value
for storage. e.g. for JSONArray, this is an empty dictionary.
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class IntervalIndex(pandas.core.indexes.extension.ExtensionIndex) |
| |
IntervalIndex(data, closed=None, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None, verify_integrity: 'bool' = True) -> 'IntervalIndex'
Immutable index of intervals that are closed on the same side.
.. versionadded:: 0.20.0
Parameters
----------
data : array-like (1-dimensional)
Array-like containing Interval objects from which to build the
IntervalIndex.
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Whether the intervals are closed on the left-side, right-side, both or
neither.
dtype : dtype or None, default None
If None, dtype will be inferred.
copy : bool, default False
Copy the input data.
name : object, optional
Name to be stored in the index.
verify_integrity : bool, default True
Verify that the IntervalIndex is valid.
Attributes
----------
left
right
closed
mid
length
is_empty
is_non_overlapping_monotonic
is_overlapping
values
Methods
-------
from_arrays
from_tuples
from_breaks
contains
overlaps
set_closed
to_tuples
See Also
--------
Index : The base pandas Index type.
Interval : A bounded slice-like interval; the elements of an IntervalIndex.
interval_range : Function to create a fixed frequency IntervalIndex.
cut : Bin values into discrete Intervals.
qcut : Bin values into equal-sized Intervals based on rank or sample quantiles.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#intervalindex>`__
for more.
Examples
--------
A new ``IntervalIndex`` is typically constructed using
:func:`interval_range`:
>>> pd.interval_range(start=0, end=5)
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]],
dtype='interval[int64, right]')
It may also be constructed using one of the constructor
methods: :meth:`IntervalIndex.from_arrays`,
:meth:`IntervalIndex.from_breaks`, and :meth:`IntervalIndex.from_tuples`.
See further examples in the doc strings of ``interval_range`` and the
mentioned constructor methods. |
| |
- Method resolution order:
- IntervalIndex
- pandas.core.indexes.extension.ExtensionIndex
- pandas.core.indexes.base.Index
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- builtins.object
Methods defined here:
- __array__(self, *args, **kwargs)
- Return the IntervalArray's data as a numpy array of Interval
objects (with dtype='object')
- __contains__(self, key: 'Any') -> 'bool'
- return a boolean if this key is IN the index
We *only* accept an Interval
Parameters
----------
key : Interval
Returns
-------
bool
- __reduce__(self)
- Helper for pickle.
- contains(self, *args, **kwargs)
- Check elementwise if the Intervals contain the value.
Return a boolean mask whether the value is contained in the Intervals
of the IntervalArray.
.. versionadded:: 0.25.0
Parameters
----------
other : scalar
The value to check whether it is contained in the Intervals.
Returns
-------
boolean array
See Also
--------
Interval.contains : Check whether Interval object contains value.
IntervalArray.overlaps : Check if an Interval overlaps the values in the
IntervalArray.
Examples
--------
>>> intervals = pd.arrays.IntervalArray.from_tuples([(0, 1), (1, 3), (2, 4)])
>>> intervals
<IntervalArray>
[(0, 1], (1, 3], (2, 4]]
Length: 3, dtype: interval[int64, right]
>>> intervals.contains(0.5)
array([ True, False, False])
- get_indexer_non_unique(self, target: 'Index') -> 'tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : IntervalIndex or list of Intervals
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
missing : np.ndarray[np.intp]
An indexer into the target of the values not found.
These correspond to the -1 in the indexer array.
- get_loc(self, key, method: 'str | None' = None, tolerance=None) -> 'int | slice | np.ndarray'
- Get integer location, slice or boolean mask for requested label.
Parameters
----------
key : label
method : {None}, optional
* default: matches where the label is within an interval only.
Returns
-------
int if unique index, slice if monotonic index, else mask
Examples
--------
>>> i1, i2 = pd.Interval(0, 1), pd.Interval(1, 2)
>>> index = pd.IntervalIndex([i1, i2])
>>> index.get_loc(1)
0
You can also supply a point inside an interval.
>>> index.get_loc(1.5)
1
If a label is in several intervals, you get the locations of all the
relevant intervals.
>>> i3 = pd.Interval(0, 2)
>>> overlapping_index = pd.IntervalIndex([i1, i2, i3])
>>> overlapping_index.get_loc(0.5)
array([ True, False, True])
Only exact matches will be returned if an interval is provided.
>>> index.get_loc(pd.Interval(0, 1))
0
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of the values.
Parameters
----------
deep : bool, default False
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption.
Returns
-------
bytes used
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False or if used on PyPy
- overlaps(self, *args, **kwargs)
- Check elementwise if an Interval overlaps the values in the IntervalArray.
Two intervals overlap if they share a common point, including closed
endpoints. Intervals that only have an open endpoint in common do not
overlap.
Parameters
----------
other : IntervalArray
Interval to check against for an overlap.
Returns
-------
ndarray
Boolean array positionally indicating where an overlap occurs.
See Also
--------
Interval.overlaps : Check whether two Interval objects overlap.
Examples
--------
>>> data = [(0, 1), (1, 3), (2, 4)]
>>> intervals = pd.arrays.IntervalArray.from_tuples(data)
>>> intervals
<IntervalArray>
[(0, 1], (1, 3], (2, 4]]
Length: 3, dtype: interval[int64, right]
>>> intervals.overlaps(pd.Interval(0.5, 1.5))
array([ True, True, False])
Intervals that share closed endpoints overlap:
>>> intervals.overlaps(pd.Interval(1, 3, closed='left'))
array([ True, True, True])
Intervals that only have an open endpoint in common do not overlap:
>>> intervals.overlaps(pd.Interval(1, 2, closed='right'))
array([False, True, False])
- set_closed(self, *args, **kwargs)
- Return an IntervalArray identical to the current one, but closed on the
specified side.
Parameters
----------
closed : {'left', 'right', 'both', 'neither'}
Whether the intervals are closed on the left-side, right-side, both
or neither.
Returns
-------
new_index : IntervalArray
Examples
--------
>>> index = pd.arrays.IntervalArray.from_breaks(range(4))
>>> index
<IntervalArray>
[(0, 1], (1, 2], (2, 3]]
Length: 3, dtype: interval[int64, right]
>>> index.set_closed('both')
<IntervalArray>
[[0, 1], [1, 2], [2, 3]]
Length: 3, dtype: interval[int64, both]
- to_tuples(self, *args, **kwargs)
- Return an ndarray of tuples of the form (left, right).
Parameters
----------
na_tuple : bool, default True
Returns NA as a tuple if True, ``(nan, nan)``, or just as the NA
value itself if False, ``nan``.
Returns
-------
tuples: ndarray
Class methods defined here:
- from_arrays(left, right, closed: 'str' = 'right', name: 'Hashable' = None, copy: 'bool' = False, dtype: 'Dtype | None' = None) -> 'IntervalIndex' from builtins.type
- Construct from two arrays defining the left and right bounds.
Parameters
----------
left : array-like (1-dimensional)
Left bounds for each interval.
right : array-like (1-dimensional)
Right bounds for each interval.
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Whether the intervals are closed on the left-side, right-side, both
or neither.
copy : bool, default False
Copy the data.
dtype : dtype, optional
If None, dtype will be inferred.
Returns
-------
IntervalIndex
Raises
------
ValueError
When a value is missing in only one of `left` or `right`.
When a value in `left` is greater than the corresponding value
in `right`.
See Also
--------
interval_range : Function to create a fixed frequency IntervalIndex.
IntervalIndex.from_breaks : Construct an IntervalIndex from an array of
splits.
IntervalIndex.from_tuples : Construct an IntervalIndex from an
array-like of tuples.
Notes
-----
Each element of `left` must be less than or equal to the `right`
element at the same position. If an element is missing, it must be
missing in both `left` and `right`. A TypeError is raised when
using an unsupported type for `left` or `right`. At the moment,
'category', 'object', and 'string' subtypes are not supported.
Examples
--------
>>> pd.IntervalIndex.from_arrays([0, 1, 2], [1, 2, 3])
IntervalIndex([(0, 1], (1, 2], (2, 3]],
dtype='interval[int64, right]')
- from_breaks(breaks, closed: 'str' = 'right', name: 'Hashable' = None, copy: 'bool' = False, dtype: 'Dtype | None' = None) -> 'IntervalIndex' from builtins.type
- Construct an IntervalIndex from an array of splits.
Parameters
----------
breaks : array-like (1-dimensional)
Left and right bounds for each interval.
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Whether the intervals are closed on the left-side, right-side, both
or neither.
copy : bool, default False
Copy the data.
dtype : dtype or None, default None
If None, dtype will be inferred.
Returns
-------
IntervalIndex
See Also
--------
interval_range : Function to create a fixed frequency IntervalIndex.
IntervalIndex.from_arrays : Construct from a left and right array.
IntervalIndex.from_tuples : Construct from a sequence of tuples.
Examples
--------
>>> pd.IntervalIndex.from_breaks([0, 1, 2, 3])
IntervalIndex([(0, 1], (1, 2], (2, 3]],
dtype='interval[int64, right]')
- from_tuples(data, closed: 'str' = 'right', name: 'Hashable' = None, copy: 'bool' = False, dtype: 'Dtype | None' = None) -> 'IntervalIndex' from builtins.type
- Construct an IntervalIndex from an array-like of tuples.
Parameters
----------
data : array-like (1-dimensional)
Array of tuples.
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Whether the intervals are closed on the left-side, right-side, both
or neither.
copy : bool, default False
By-default copy the data, this is compat only and ignored.
dtype : dtype or None, default None
If None, dtype will be inferred.
Returns
-------
IntervalIndex
See Also
--------
interval_range : Function to create a fixed frequency IntervalIndex.
IntervalIndex.from_arrays : Construct an IntervalIndex from a left and
right array.
IntervalIndex.from_breaks : Construct an IntervalIndex from an array of
splits.
Examples
--------
>>> pd.IntervalIndex.from_tuples([(0, 1), (1, 2)])
IntervalIndex([(0, 1], (1, 2]],
dtype='interval[int64, right]')
Static methods defined here:
- __new__(cls, data, closed=None, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None, verify_integrity: 'bool' = True) -> 'IntervalIndex'
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- inferred_type
- Return a string of the type inferred from the values
- is_overlapping
- Return True if the IntervalIndex has overlapping intervals, else False.
Two intervals overlap if they share a common point, including closed
endpoints. Intervals that only have an open endpoint in common do not
overlap.
Returns
-------
bool
Boolean indicating if the IntervalIndex has overlapping intervals.
See Also
--------
Interval.overlaps : Check whether two Interval objects overlap.
IntervalIndex.overlaps : Check an IntervalIndex elementwise for
overlaps.
Examples
--------
>>> index = pd.IntervalIndex.from_tuples([(0, 2), (1, 3), (4, 5)])
>>> index
IntervalIndex([(0, 2], (1, 3], (4, 5]],
dtype='interval[int64, right]')
>>> index.is_overlapping
True
Intervals that share closed endpoints overlap:
>>> index = pd.interval_range(0, 3, closed='both')
>>> index
IntervalIndex([[0, 1], [1, 2], [2, 3]],
dtype='interval[int64, both]')
>>> index.is_overlapping
True
Intervals that only have an open endpoint in common do not overlap:
>>> index = pd.interval_range(0, 3, closed='left')
>>> index
IntervalIndex([[0, 1), [1, 2), [2, 3)],
dtype='interval[int64, left]')
>>> index.is_overlapping
False
- length
Data descriptors defined here:
- closed
- Whether the intervals are closed on the left-side, right-side, both or
neither.
- closed_left
- Check if the interval is closed on the left side.
For the meaning of `closed` and `open` see :class:`~pandas.Interval`.
Returns
-------
bool
True if the Interval is closed on the left-side.
- closed_right
- Check if the interval is closed on the right side.
For the meaning of `closed` and `open` see :class:`~pandas.Interval`.
Returns
-------
bool
True if the Interval is closed on the left-side.
- is_empty
- Indicates if an interval is empty, meaning it contains no points.
.. versionadded:: 0.25.0
Returns
-------
bool or ndarray
A boolean indicating if a scalar :class:`Interval` is empty, or a
boolean ``ndarray`` positionally indicating if an ``Interval`` in
an :class:`~arrays.IntervalArray` or :class:`IntervalIndex` is
empty.
Examples
--------
An :class:`Interval` that contains points is not empty:
>>> pd.Interval(0, 1, closed='right').is_empty
False
An ``Interval`` that does not contain any points is empty:
>>> pd.Interval(0, 0, closed='right').is_empty
True
>>> pd.Interval(0, 0, closed='left').is_empty
True
>>> pd.Interval(0, 0, closed='neither').is_empty
True
An ``Interval`` that contains a single point is not empty:
>>> pd.Interval(0, 0, closed='both').is_empty
False
An :class:`~arrays.IntervalArray` or :class:`IntervalIndex` returns a
boolean ``ndarray`` positionally indicating if an ``Interval`` is
empty:
>>> ivs = [pd.Interval(0, 0, closed='neither'),
... pd.Interval(1, 2, closed='neither')]
>>> pd.arrays.IntervalArray(ivs).is_empty
array([ True, False])
Missing values are not considered empty:
>>> ivs = [pd.Interval(0, 0, closed='neither'), np.nan]
>>> pd.IntervalIndex(ivs).is_empty
array([ True, False])
- is_monotonic_decreasing
- Return True if the IntervalIndex is monotonic decreasing (only equal or
decreasing values), else False
- is_non_overlapping_monotonic
- Return True if the IntervalArray is non-overlapping (no Intervals share
points) and is either monotonic increasing or monotonic decreasing,
else False.
- is_unique
- Return True if the IntervalIndex contains unique elements, else False.
- left
- mid
- open_left
- Check if the interval is open on the left side.
For the meaning of `closed` and `open` see :class:`~pandas.Interval`.
Returns
-------
bool
True if the Interval is closed on the left-side.
- open_right
- Check if the interval is open on the right side.
For the meaning of `closed` and `open` see :class:`~pandas.Interval`.
Returns
-------
bool
True if the Interval is closed on the left-side.
- right
Data and other attributes defined here:
- __annotations__ = {'_data': 'IntervalArray', '_values': 'IntervalArray', 'closed': 'str', 'closed_left': 'bool', 'closed_right': 'bool', 'is_non_overlapping_monotonic': 'bool'}
Methods inherited from pandas.core.indexes.extension.ExtensionIndex:
- map(self, mapper, na_action=None)
- Map values using an input mapping or function.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence.
Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.
Methods inherited from pandas.core.indexes.base.Index:
- __abs__(self)
- __and__(self, other)
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str_t', *inputs, **kwargs)
- __array_wrap__(self, result, context=None)
- Gets called after a ufunc and other functions e.g. np.split.
- __bool__ = __nonzero__(self)
- __copy__(self: '_IndexT', **kwargs) -> '_IndexT'
- __deepcopy__(self: '_IndexT', memo=None) -> '_IndexT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __getitem__(self, key)
- Override numpy.ndarray's __getitem__ method to work as desired.
This function adds lists and Series as valid boolean indexers
(ndarrays only supports ndarray with dtype=bool).
If resulting ndim != 1, plain ndarray is returned instead of
corresponding `Index` subclass.
- __iadd__(self, other)
- __invert__(self)
- __len__(self) -> 'int'
- Return the length of the Index.
- __neg__(self)
- __nonzero__(self)
- __or__(self, other)
- __pos__(self)
- __repr__(self) -> 'str_t'
- Return a string representation for this object.
- __setitem__(self, key, value)
- __xor__(self, other)
- all(self, *args, **kwargs)
- Return whether all elements are Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
all : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.any : Return whether any element in an Index is True.
Series.any : Return whether any element in a Series is True.
Series.all : Return whether all elements in a Series are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because ``0`` is considered False.
>>> pd.Index([0, 1, 2]).all()
False
- any(self, *args, **kwargs)
- Return whether any element is Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
any : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.all : Return whether all elements are True.
Series.all : Return whether all elements are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
>>> index = pd.Index([0, 1, 2])
>>> index.any()
True
>>> index = pd.Index([0, 0, 0])
>>> index.any()
False
- append(self, other: 'Index | Sequence[Index]') -> 'Index'
- Append a collection of Index options together.
Parameters
----------
other : Index or list/tuple of indices
Returns
-------
Index
- argmax(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argsort(self, *args, **kwargs) -> 'npt.NDArray[np.intp]'
- Return the integer indices that would sort the index.
Parameters
----------
*args
Passed to `numpy.ndarray.argsort`.
**kwargs
Passed to `numpy.ndarray.argsort`.
Returns
-------
np.ndarray[np.intp]
Integer indices that would sort the index if used as
an indexer.
See Also
--------
numpy.argsort : Similar method for NumPy arrays.
Index.sort_values : Return sorted copy of Index.
Examples
--------
>>> idx = pd.Index(['b', 'a', 'd', 'c'])
>>> idx
Index(['b', 'a', 'd', 'c'], dtype='object')
>>> order = idx.argsort()
>>> order
array([1, 0, 3, 2])
>>> idx[order]
Index(['a', 'b', 'c', 'd'], dtype='object')
- asof(self, label)
- Return the label from the index, or, if not present, the previous one.
Assuming that the index is sorted, return the passed index label if it
is in the index, or return the previous index label if the passed one
is not in the index.
Parameters
----------
label : object
The label up to which the method returns the latest index label.
Returns
-------
object
The passed label if it is in the index. The previous label if the
passed label is not in the sorted index or `NaN` if there is no
such label.
See Also
--------
Series.asof : Return the latest value in a Series up to the
passed index.
merge_asof : Perform an asof merge (similar to left join but it
matches on nearest key rather than equal key).
Index.get_loc : An `asof` is a thin wrapper around `get_loc`
with method='pad'.
Examples
--------
`Index.asof` returns the latest index label up to the passed label.
>>> idx = pd.Index(['2013-12-31', '2014-01-02', '2014-01-03'])
>>> idx.asof('2014-01-01')
'2013-12-31'
If the label is in the index, the method returns the passed label.
>>> idx.asof('2014-01-02')
'2014-01-02'
If all of the labels in the index are later than the passed label,
NaN is returned.
>>> idx.asof('1999-01-02')
nan
If the index is not sorted, an error is raised.
>>> idx_not_sorted = pd.Index(['2013-12-31', '2015-01-02',
... '2014-01-03'])
>>> idx_not_sorted.asof('2013-12-31')
Traceback (most recent call last):
ValueError: index must be monotonic increasing or decreasing
- asof_locs(self, where: 'Index', mask: 'np.ndarray') -> 'npt.NDArray[np.intp]'
- Return the locations (indices) of labels in the index.
As in the `asof` function, if the label (a particular entry in
`where`) is not in the index, the latest index label up to the
passed label is chosen and its index returned.
If all of the labels in the index are later than a label in `where`,
-1 is returned.
`mask` is used to ignore NA values in the index during calculation.
Parameters
----------
where : Index
An Index consisting of an array of timestamps.
mask : np.ndarray[bool]
Array of booleans denoting where values in the original
data are not NA.
Returns
-------
np.ndarray[np.intp]
An array of locations (indices) of the labels from the Index
which correspond to the return values of the `asof` function
for every element in `where`.
- astype(self, dtype, copy: 'bool' = True)
- Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a TypeError exception is raised.
Parameters
----------
dtype : numpy dtype or pandas type
Note that any signed integer `dtype` is treated as ``'int64'``,
and any unsigned integer `dtype` is treated as ``'uint64'``,
regardless of the size.
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
Returns
-------
Index
Index with values cast to specified dtype.
- copy(self: '_IndexT', name: 'Hashable | None' = None, deep: 'bool' = False, dtype: 'Dtype | None' = None, names: 'Sequence[Hashable] | None' = None) -> '_IndexT'
- Make a copy of this object.
Name and dtype sets those attributes on the new object.
Parameters
----------
name : Label, optional
Set name for new object.
deep : bool, default False
dtype : numpy dtype or pandas type, optional
Set dtype for new object.
.. deprecated:: 1.2.0
use ``astype`` method instead.
names : list-like, optional
Kept for compatibility with MultiIndex. Should not be used.
.. deprecated:: 1.4.0
use ``name`` instead.
Returns
-------
Index
Index refer to new object which is a copy of this object.
Notes
-----
In most cases, there should be no functional difference from using
``deep``, but if ``deep`` is passed it will attempt to deepcopy.
- delete(self: '_IndexT', loc) -> '_IndexT'
- Make new Index with passed location(-s) deleted.
Parameters
----------
loc : int or list of int
Location of item(-s) which will be deleted.
Use a list of locations to delete more than one value at the same time.
Returns
-------
Index
Will be same type as self, except for RangeIndex.
See Also
--------
numpy.delete : Delete any rows and column from NumPy array (ndarray).
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete(1)
Index(['a', 'c'], dtype='object')
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete([0, 2])
Index(['b'], dtype='object')
- difference(self, other, sort=None)
- Return a new Index with elements of index not in `other`.
This is the set difference of two Index objects.
Parameters
----------
other : Index or array-like
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
difference : Index
Examples
--------
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
- drop(self, labels, errors: 'str_t' = 'raise') -> 'Index'
- Make new Index with passed list of labels deleted.
Parameters
----------
labels : array-like or scalar
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and existing labels are dropped.
Returns
-------
dropped : Index
Will be same type as self, except for RangeIndex.
Raises
------
KeyError
If not all of the labels are found in the selected axis
- drop_duplicates(self: '_IndexT', keep: 'str_t | bool' = 'first') -> '_IndexT'
- Return Index with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
Returns
-------
deduplicated : Index
See Also
--------
Series.drop_duplicates : Equivalent method on Series.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Index.duplicated : Related method on Index, indicating duplicate
Index values.
Examples
--------
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The `keep` parameter controls which duplicate values are removed.
The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value 'last' keeps the last occurrence for each set of duplicated
entries.
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value ``False`` discards all sets of duplicated entries.
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
- droplevel(self, level=0)
- Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex.
Parameters
----------
level : int, str, or list-like, default 0
If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
Returns
-------
Index or MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.droplevel()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
>>> mi.droplevel(2)
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel('z')
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel(['x', 'y'])
Int64Index([5, 6], dtype='int64', name='z')
- dropna(self: '_IndexT', how: 'str_t' = 'any') -> '_IndexT'
- Return Index without NA/NaN values.
Parameters
----------
how : {'any', 'all'}, default 'any'
If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
Returns
-------
Index
- duplicated(self, keep: "Literal[('first', 'last', False)]" = 'first') -> 'npt.NDArray[np.bool_]'
- Indicate duplicate index values.
Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first, or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
The value or values in a set of duplicates to mark as missing.
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
np.ndarray[bool]
See Also
--------
Series.duplicated : Equivalent method on pandas.Series.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Index.drop_duplicates : Remove duplicate values from Index.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set to False and all others to True:
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first')
array([False, False, True, False, True])
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> idx.duplicated(keep='last')
array([ True, False, True, False, False])
By setting keep on ``False``, all duplicates are True:
>>> idx.duplicated(keep=False)
array([ True, False, True, False, True])
- equals(self, other: 'Any') -> 'bool'
- Determine if two Index object are equal.
The things that are being compared are:
* The elements inside the Index object.
* The order of the elements inside the Index object.
Parameters
----------
other : Any
The other object to compare against.
Returns
-------
bool
True if "other" is an Index and it has the same elements and order
as the calling index; False otherwise.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3])
>>> idx1
Int64Index([1, 2, 3], dtype='int64')
>>> idx1.equals(pd.Index([1, 2, 3]))
True
The elements inside are compared
>>> idx2 = pd.Index(["1", "2", "3"])
>>> idx2
Index(['1', '2', '3'], dtype='object')
>>> idx1.equals(idx2)
False
The order is compared
>>> ascending_idx = pd.Index([1, 2, 3])
>>> ascending_idx
Int64Index([1, 2, 3], dtype='int64')
>>> descending_idx = pd.Index([3, 2, 1])
>>> descending_idx
Int64Index([3, 2, 1], dtype='int64')
>>> ascending_idx.equals(descending_idx)
False
The dtype is *not* compared
>>> int64_idx = pd.Int64Index([1, 2, 3])
>>> int64_idx
Int64Index([1, 2, 3], dtype='int64')
>>> uint64_idx = pd.UInt64Index([1, 2, 3])
>>> uint64_idx
UInt64Index([1, 2, 3], dtype='uint64')
>>> int64_idx.equals(uint64_idx)
True
- fillna(self, value=None, downcast=None)
- Fill NA/NaN values with the specified value.
Parameters
----------
value : scalar
Scalar value to use to fill holes (e.g. 0).
This value cannot be a list-likes.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
Index
See Also
--------
DataFrame.fillna : Fill NaN values of a DataFrame.
Series.fillna : Fill NaN Values of a Series.
- format(self, name: 'bool' = False, formatter: 'Callable | None' = None, na_rep: 'str_t' = 'NaN') -> 'list[str_t]'
- Render a string representation of the Index.
- get_indexer(self, target, method: 'str_t | None' = None, limit: 'int | None' = None, tolerance=None) -> 'npt.NDArray[np.intp]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
Notes
-----
Returns -1 for unmatched values, for further explanation see the
example below.
Examples
--------
>>> index = pd.Index(['c', 'a', 'b'])
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1, 2, -1])
Notice that the return value is an array of locations in ``index``
and ``x`` is marked by -1, as it is not in ``index``.
- get_indexer_for(self, target) -> 'npt.NDArray[np.intp]'
- Guaranteed return of an indexer even when non-unique.
This dispatches to get_indexer or get_indexer_non_unique
as appropriate.
Returns
-------
np.ndarray[np.intp]
List of indices.
Examples
--------
>>> idx = pd.Index([np.nan, 'var1', np.nan])
>>> idx.get_indexer_for([np.nan])
array([0, 2])
- get_level_values = _get_level_values(self, level) -> 'Index'
- get_slice_bound(self, label, side: "Literal[('left', 'right')]", kind=<no_default>) -> 'int'
- Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position
of given label.
Parameters
----------
label : object
side : {'left', 'right'}
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
int
Index of label.
- get_value(self, series: 'Series', key)
- Fast lookup of value from 1-dimensional ndarray.
Only use this if you know what you're doing.
Returns
-------
scalar or Series
- groupby(self, values) -> 'PrettyDict[Hashable, np.ndarray]'
- Group the index labels by a given array of values.
Parameters
----------
values : array
Values used to determine the groups.
Returns
-------
dict
{group name -> group labels}
- holds_integer(self) -> 'bool'
- Whether the type is an integer type.
- identical(self, other) -> 'bool'
- Similar to equals, but checks that object attributes and types are also equal.
Returns
-------
bool
If two Index objects have equal elements and same type True,
otherwise False.
- insert(self, loc: 'int', item) -> 'Index'
- Make new Index inserting new item at location.
Follows Python numpy.insert semantics for negative values.
Parameters
----------
loc : int
item : object
Returns
-------
new_index : Index
- intersection(self, other, sort=False)
- Form the intersection of two Index objects.
This returns a new Index with elements common to the index and `other`.
Parameters
----------
other : Index or array-like
sort : False or None, default False
Whether to sort the resulting index.
* False : do not sort the result.
* None : sort the result, except when `self` and `other` are equal
or when the values cannot be compared.
Returns
-------
intersection : Index
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
- is_(self, other) -> 'bool'
- More flexible, faster check like ``is`` but that works through views.
Note: this is *not* the same as ``Index.identical()``, which checks
that metadata is also the same.
Parameters
----------
other : object
Other object to compare against.
Returns
-------
bool
True if both have same underlying data, False otherwise.
See Also
--------
Index.identical : Works like ``Index.is_`` but also checks metadata.
- is_boolean(self) -> 'bool'
- Check if the Index only consists of booleans.
Returns
-------
bool
Whether or not the Index only consists of booleans.
See Also
--------
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
- is_categorical(self) -> 'bool'
- Check if the Index holds categorical data.
Returns
-------
bool
True if the Index is categorical.
See Also
--------
CategoricalIndex : Index for categorical data.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = pd.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0 Peter
1 Victor
2 Elisabeth
3 Mar
dtype: object
>>> s.index.is_categorical()
False
- is_floating(self) -> 'bool'
- Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats,
integers, or NaNs.
Returns
-------
bool
Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
- is_integer(self) -> 'bool'
- Check if the Index only consists of integers.
Returns
-------
bool
Whether or not the Index only consists of integers.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
- is_interval(self) -> 'bool'
- Check if the Index holds Interval objects.
Returns
-------
bool
Whether or not the Index holds Interval objects.
See Also
--------
IntervalIndex : Index for Interval objects.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
- is_mixed(self) -> 'bool'
- Check if the Index holds data with mixed data types.
Returns
-------
bool
Whether or not the Index holds data with mixed data types.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
- is_numeric(self) -> 'bool'
- Check if the Index only consists of numeric data.
Returns
-------
bool
Whether or not the Index only consists of numeric data.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
- is_object(self) -> 'bool'
- Check if the Index is of the object dtype.
Returns
-------
bool
Whether or not the Index is of the object dtype.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
- is_type_compatible(self, kind: 'str_t') -> 'bool'
- Whether the index type is compatible with the provided type.
- isin(self, values, level=None) -> 'np.ndarray'
- Return a boolean array where the index values are in `values`.
Compute boolean array of whether each index value is found in the
passed set of values. The length of the returned boolean array matches
the length of the index.
Parameters
----------
values : set or list-like
Sought values.
level : str or int, optional
Name or position of the index level to use (if the index is a
`MultiIndex`).
Returns
-------
np.ndarray[bool]
NumPy array of boolean values.
See Also
--------
Series.isin : Same for Series.
DataFrame.isin : Same method for DataFrames.
Notes
-----
In the case of `MultiIndex` you must either specify `values` as a
list-like object containing tuples that are the same length as the
number of levels, or specify `level`. Otherwise it will raise a
``ValueError``.
If `level` is specified:
- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.
Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4])
array([ True, False, False])
>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red', 'blue', 'green']],
... names=('number', 'color'))
>>> midx
MultiIndex([(1, 'red'),
(2, 'blue'),
(3, 'green')],
names=['number', 'color'])
Check whether the strings in the 'color' level of the MultiIndex
are in a list of colors.
>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])
To check across the levels of a MultiIndex, pass a list of tuples:
>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
For a DatetimeIndex, string values in `values` are converted to
Timestamps.
>>> dates = ['2000-03-11', '2000-03-12', '2000-03-13']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],
dtype='datetime64[ns]', freq=None)
>>> dti.isin(['2000-03-11'])
array([ True, False, False])
- isna(self) -> 'npt.NDArray[np.bool_]'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`pd.NaT`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values. Characters such as
empty strings `''` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
numpy.ndarray[bool]
A boolean array of whether my values are NA.
See Also
--------
Index.notna : Boolean inverse of isna.
Index.dropna : Omit entries with missing values.
isna : Top-level isna.
Series.isna : Detect missing values in Series object.
Examples
--------
Show which entries in a pandas.Index are NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.isna()
array([False, False, True])
Empty strings are not considered NA values. None is considered an NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.isna()
array([False, False, False, True])
For datetimes, `NaT` (Not a Time) is considered as an NA value.
>>> idx = pd.DatetimeIndex([pd.Timestamp('1940-04-25'),
... pd.Timestamp(''), None, pd.NaT])
>>> idx
DatetimeIndex(['1940-04-25', 'NaT', 'NaT', 'NaT'],
dtype='datetime64[ns]', freq=None)
>>> idx.isna()
array([False, True, True, True])
- isnull = isna(self) -> 'npt.NDArray[np.bool_]'
- join(self, other, how: 'str_t' = 'left', level=None, return_indexers: 'bool' = False, sort: 'bool' = False)
- Compute join_index and indexers to conform data
structures to the new index.
Parameters
----------
other : Index
how : {'left', 'right', 'inner', 'outer'}
level : int or level name, default None
return_indexers : bool, default False
sort : bool, default False
Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword).
Returns
-------
join_index, (left_indexer, right_indexer)
- max(self, axis=None, skipna=True, *args, **kwargs)
- Return the maximum value of the Index.
Parameters
----------
axis : int, optional
For compatibility with NumPy. Only 0 or None are allowed.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Maximum value.
See Also
--------
Index.min : Return the minimum value in an Index.
Series.max : Return the maximum value in a Series.
DataFrame.max : Return the maximum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.max()
3
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.max()
'c'
For a MultiIndex, the maximum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.max()
('b', 2)
- min(self, axis=None, skipna=True, *args, **kwargs)
- Return the minimum value of the Index.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Minimum value.
See Also
--------
Index.max : Return the maximum value of the object.
Series.min : Return the minimum value in a Series.
DataFrame.min : Return the minimum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.min()
1
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.min()
'a'
For a MultiIndex, the minimum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.min()
('a', 1)
- notna(self) -> 'npt.NDArray[np.bool_]'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.
Returns
-------
numpy.ndarray[bool]
Boolean array to indicate which entries are not NA.
See Also
--------
Index.notnull : Alias of notna.
Index.isna: Inverse of notna.
notna : Top-level notna.
Examples
--------
Show which entries in an Index are not NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.notna()
array([ True, True, False])
Empty strings are not considered NA values. None is considered a NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.notna()
array([ True, True, True, False])
- notnull = notna(self) -> 'npt.NDArray[np.bool_]'
- putmask(self, mask, value) -> 'Index'
- Return a new Index of the values set with the mask.
Returns
-------
Index
See Also
--------
numpy.ndarray.putmask : Changes elements of an array
based on conditional and input values.
- ravel(self, order='C')
- Return an ndarray of the flattened values of the underlying data.
Returns
-------
numpy.ndarray
Flattened array.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> 'tuple[Index, npt.NDArray[np.intp] | None]'
- Create index with target's values.
Parameters
----------
target : an iterable
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
level : int, optional
Level of multiindex.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : int or float, optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
new_index : pd.Index
Resulting index.
indexer : np.ndarray[np.intp] or None
Indices of output values in original index.
Raises
------
TypeError
If ``method`` passed along with ``level``.
ValueError
If non-unique multi-index
ValueError
If non-unique index and ``method`` or ``limit`` passed.
See Also
--------
Series.reindex
DataFrame.reindex
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.reindex(['car', 'bike'])
(Index(['car', 'bike'], dtype='object'), array([0, 1]))
- rename(self, name, inplace=False)
- Alter Index or MultiIndex name.
Able to set new names without level. Defaults to returning new index.
Length of names must match number of levels in MultiIndex.
Parameters
----------
name : label or list of labels
Name(s) to set.
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.set_names : Able to set new names partially and by level.
Examples
--------
>>> idx = pd.Index(['A', 'C', 'A', 'B'], name='score')
>>> idx.rename('grade')
Index(['A', 'C', 'A', 'B'], dtype='object', name='grade')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]],
... names=['kind', 'year'])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.rename(['species', 'year'])
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
>>> idx.rename('species')
Traceback (most recent call last):
TypeError: Must pass list-like as `names`.
- repeat(self, repeats, axis=None)
- Repeat elements of a Index.
Returns a new Index where each element of the current Index
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Index.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
repeated_index : Index
Newly created Index with repeated elements.
See Also
--------
Series.repeat : Equivalent function for Series.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2)
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3])
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
- set_names(self, names, level=None, inplace: 'bool' = False)
- Set Index or MultiIndex name.
Able to set new names partially and by level.
Parameters
----------
names : label or list of label or dict-like for MultiIndex
Name(s) to set.
.. versionchanged:: 1.3.0
level : int, label or list of int or label, optional
If the index is a MultiIndex and names is not dict-like, level(s) to set
(None for all levels). Otherwise level must be None.
.. versionchanged:: 1.3.0
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.rename : Able to set new names without level.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
)
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.set_names('species', level=0)
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
When renaming levels with a dict, levels can not be passed.
>>> idx.set_names({'kind': 'snake'})
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['snake', 'year'])
- set_value(self, arr, key, value)
- Fast lookup of value from 1-dimensional ndarray.
.. deprecated:: 1.0
Notes
-----
Only use this if you know what you're doing.
- shift(self, periods=1, freq=None)
- Shift index by desired number of time frequency increments.
This method is for shifting the values of datetime-like indexes
by a specified time increment a given number of times.
Parameters
----------
periods : int, default 1
Number of periods (or increments) to shift by,
can be positive or negative.
freq : pandas.DateOffset, pandas.Timedelta or str, optional
Frequency increment to shift by.
If None, the index is shifted by its own `freq` attribute.
Offset aliases are valid strings, e.g., 'D', 'W', 'M' etc.
Returns
-------
pandas.Index
Shifted index.
See Also
--------
Series.shift : Shift values of Series.
Notes
-----
This method is only implemented for datetime-like index classes,
i.e., DatetimeIndex, PeriodIndex and TimedeltaIndex.
Examples
--------
Put the first 5 month starts of 2011 into an index.
>>> month_starts = pd.date_range('1/1/2011', periods=5, freq='MS')
>>> month_starts
DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01', '2011-04-01',
'2011-05-01'],
dtype='datetime64[ns]', freq='MS')
Shift the index by 10 days.
>>> month_starts.shift(10, freq='D')
DatetimeIndex(['2011-01-11', '2011-02-11', '2011-03-11', '2011-04-11',
'2011-05-11'],
dtype='datetime64[ns]', freq=None)
The default value of `freq` is the `freq` attribute of the index,
which is 'MS' (month start) in this example.
>>> month_starts.shift(10)
DatetimeIndex(['2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
'2012-03-01'],
dtype='datetime64[ns]', freq='MS')
- slice_indexer(self, start: 'Hashable | None' = None, end: 'Hashable | None' = None, step: 'int | None' = None, kind=<no_default>) -> 'slice'
- Compute the slice indexer for input labels and step.
Index needs to be ordered and unique.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, default None
kind : str, default None
.. deprecated:: 1.4.0
Returns
-------
indexer : slice
Raises
------
KeyError : If key does not exist, or key is not unique and index is
not ordered.
Notes
-----
This function assumes that the data is sorted, so use at your own peril
Examples
--------
This is a method on all index types. For example you can do:
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_indexer(start='b', end='c')
slice(1, 3, None)
>>> idx = pd.MultiIndex.from_arrays([list('abcd'), list('efgh')])
>>> idx.slice_indexer(start='b', end=('c', 'g'))
slice(1, 3, None)
- slice_locs(self, start=None, end=None, step=None, kind=<no_default>) -> 'tuple[int, int]'
- Compute slice locations for input labels.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, defaults None
If None, defaults to 1.
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
start, end : int
See Also
--------
Index.get_loc : Get location for a single label.
Notes
-----
This method only works if the index is monotonic or unique.
Examples
--------
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_locs(start='b', end='c')
(1, 3)
- sort(self, *args, **kwargs)
- Use sort_values instead.
- sort_values(self, return_indexer: 'bool' = False, ascending: 'bool' = True, na_position: 'str_t' = 'last', key: 'Callable | None' = None)
- Return a sorted copy of the index.
Return a sorted copy of the index, and optionally return the indices
that sorted the index itself.
Parameters
----------
return_indexer : bool, default False
Should the indices that would sort the index be returned.
ascending : bool, default True
Should the index values be sorted in an ascending order.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
.. versionadded:: 1.2.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
sorted_index : pandas.Index
Sorted copy of the index.
indexer : numpy.ndarray, optional
The indices that the index itself was sorted by.
See Also
--------
Series.sort_values : Sort values of a Series.
DataFrame.sort_values : Sort values in a DataFrame.
Examples
--------
>>> idx = pd.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices `idx` was
sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
- sortlevel(self, level=None, ascending=True, sort_remaining=None)
- For internal compatibility with the Index API.
Sort the Index. This is for compat with MultiIndex
Parameters
----------
ascending : bool, default True
False to sort in descending order
level, sort_remaining are compat parameters
Returns
-------
Index
- symmetric_difference(self, other, result_name=None, sort=None)
- Compute the symmetric difference of two Index objects.
Parameters
----------
other : Index or array-like
result_name : str
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
symmetric_difference : Index
Notes
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([2, 3, 4, 5])
>>> idx1.symmetric_difference(idx2)
Int64Index([1, 5], dtype='int64')
- take(self, indices, axis: 'int' = 0, allow_fill: 'bool' = True, fill_value=None, **kwargs)
- Return a new Index of the values selected by the indices.
For internal compatibility with numpy arrays.
Parameters
----------
indices : array-like
Indices to be taken.
axis : int, optional
The axis over which to select values, always 0.
allow_fill : bool, default True
fill_value : scalar, default None
If allow_fill=True and fill_value is not None, indices specified by
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
Returns
-------
Index
An index formed of elements at the given indices. Will be the same
type as self, except for RangeIndex.
See Also
--------
numpy.ndarray.take: Return an array formed from the
elements of a at the given indices.
- to_flat_index(self)
- Identity method.
This is implemented for compatibility with subclass implementations
when chaining.
Returns
-------
pd.Index
Caller.
See Also
--------
MultiIndex.to_flat_index : Subclass implementation.
- to_frame(self, index: 'bool' = True, name: 'Hashable' = <no_default>) -> 'DataFrame'
- Create a DataFrame with a column containing the Index.
Parameters
----------
index : bool, default True
Set the index of the returned DataFrame as the original Index.
name : object, default None
The passed name should substitute for the index name (if it has
one).
Returns
-------
DataFrame
DataFrame containing the original Index data.
See Also
--------
Index.to_series : Convert an Index to a Series.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
animal
animal
Ant Ant
Bear Bear
Cow Cow
By default, the original Index is reused. To enforce a new Index:
>>> idx.to_frame(index=False)
animal
0 Ant
1 Bear
2 Cow
To override the name of the resulting column, specify `name`:
>>> idx.to_frame(index=False, name='zoo')
zoo
0 Ant
1 Bear
2 Cow
- to_native_types(self, slicer=None, **kwargs) -> 'np.ndarray'
- Format specified values of `self` and return them.
.. deprecated:: 1.2.0
Parameters
----------
slicer : int, array-like
An indexer into `self` that specifies which values
are used in the formatting process.
kwargs : dict
Options for specifying how the values should be formatted.
These options include the following:
1) na_rep : str
The value that serves as a placeholder for NULL values
2) quoting : bool or None
Whether or not there are quoted values in `self`
3) date_format : str
The format used to represent date-like values.
Returns
-------
numpy.ndarray
Formatted values.
- to_series(self, index=None, name: 'Hashable' = None) -> 'Series'
- Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.
Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.
Returns
-------
Series
The dtype will be based on the type of the Index values.
See Also
--------
Index.to_frame : Convert an Index to a DataFrame.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
By default, the original Index and original name is reused.
>>> idx.to_series()
animal
Ant Ant
Bear Bear
Cow Cow
Name: animal, dtype: object
To enforce a new Index, specify new labels to ``index``:
>>> idx.to_series(index=[0, 1, 2])
0 Ant
1 Bear
2 Cow
Name: animal, dtype: object
To override the name of the resulting column, specify `name`:
>>> idx.to_series(name='zoo')
animal
Ant Ant
Bear Bear
Cow Cow
Name: zoo, dtype: object
- union(self, other, sort=None)
- Form the union of two Index objects.
If the Index objects are incompatible, both Index objects will be
cast to dtype('object') first.
.. versionchanged:: 0.25.0
Parameters
----------
other : Index or array-like
sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.
* False : do not sort the result.
Returns
-------
union : Index
Examples
--------
Union matching dtypes
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union mismatched dtypes
>>> idx1 = pd.Index(['a', 'b', 'c', 'd'])
>>> idx2 = pd.Index([1, 2, 3, 4])
>>> idx1.union(idx2)
Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
MultiIndex case
>>> idx1 = pd.MultiIndex.from_arrays(
... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
... )
>>> idx1
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue')],
)
>>> idx2 = pd.MultiIndex.from_arrays(
... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
... )
>>> idx2
MultiIndex([(3, 'Red'),
(3, 'Green'),
(2, 'Red'),
(2, 'Green')],
)
>>> idx1.union(idx2)
MultiIndex([(1, 'Blue'),
(1, 'Red'),
(2, 'Blue'),
(2, 'Green'),
(2, 'Red'),
(3, 'Green'),
(3, 'Red')],
)
>>> idx1.union(idx2, sort=False)
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue'),
(3, 'Red'),
(3, 'Green'),
(2, 'Green')],
)
- unique(self: '_IndexT', level: 'Hashable | None' = None) -> '_IndexT'
- Return unique values in the index.
Unique values are returned in order of appearance, this does NOT sort.
Parameters
----------
level : int or hashable, optional
Only return values from specified level (for MultiIndex).
If int, gets the level by integer position, else by level name.
Returns
-------
Index
See Also
--------
unique : Numpy array of unique values in that column.
Series.unique : Return unique values of Series object.
- view(self, cls=None)
- where(self, cond, other=None) -> 'Index'
- Replace values where the condition is False.
The replacement is taken from other.
Parameters
----------
cond : bool array-like with the same length as self
Condition to select the values on.
other : scalar, or array-like, default None
Replacement if the condition is False.
Returns
-------
pandas.Index
A copy of self with values replaced from other
where the condition is False.
See Also
--------
Series.where : Same method for Series.
DataFrame.where : Same method for DataFrame.
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.where(idx.isin(['car', 'train']), 'other')
Index(['car', 'other', 'train', 'other'], dtype='object')
Readonly properties inherited from pandas.core.indexes.base.Index:
- asi8
- Integer representation of the values.
Returns
-------
ndarray
An ndarray with int64 dtype.
- has_duplicates
- Check if the Index has duplicate values.
Returns
-------
bool
Whether or not the Index has duplicate values.
Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
- is_monotonic
- Alias for is_monotonic_increasing.
- is_monotonic_increasing
- Return if the index is monotonic increasing (only equal or
increasing) values.
Examples
--------
>>> Index([1, 2, 3]).is_monotonic_increasing
True
>>> Index([1, 2, 2]).is_monotonic_increasing
True
>>> Index([1, 3, 2]).is_monotonic_increasing
False
- nlevels
- Number of levels.
- shape
- Return a tuple of the shape of the underlying data.
- values
- Return an array representing the data in the Index.
.. warning::
We recommend using :attr:`Index.array` or
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
array: numpy.ndarray or ExtensionArray
See Also
--------
Index.array : Reference to the underlying data.
Index.to_numpy : A NumPy array representing the underlying data.
Data descriptors inherited from pandas.core.indexes.base.Index:
- array
- The ExtensionArray of the data backing this Series or Index.
Returns
-------
ExtensionArray
An ExtensionArray of the values stored within. For extension
types, this is the actual array. For NumPy native types, this
is a thin (no copy) wrapper around :class:`numpy.ndarray`.
``.array`` differs ``.values`` which may require converting the
data to a different form.
See Also
--------
Index.to_numpy : Similar method that always returns a NumPy array.
Series.to_numpy : Similar method that always returns a NumPy array.
Notes
-----
This table lays out the different array types for each extension
dtype within pandas.
================== =============================
dtype array type
================== =============================
category Categorical
period PeriodArray
interval IntervalArray
IntegerNA IntegerArray
string StringArray
boolean BooleanArray
datetime64[ns, tz] DatetimeArray
================== =============================
For any 3rd-party extension types, the array type will be an
ExtensionArray.
For all remaining dtypes ``.array`` will be a
:class:`arrays.NumpyExtensionArray` wrapping the actual ndarray
stored within. If you absolutely need a NumPy array (possibly with
copying / coercing data), then use :meth:`Series.to_numpy` instead.
Examples
--------
For regular NumPy types like int, and float, a PandasArray
is returned.
>>> pd.Series([1, 2, 3]).array
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray
is returned
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
- dtype
- Return the dtype object of the underlying data.
- hasnans
- Return True if there are any NaNs.
Enables various performance speedups.
- is_all_dates
- Whether or not the index values only consist of dates.
- name
- Return Index or MultiIndex name.
- names
Data and other attributes inherited from pandas.core.indexes.base.Index:
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1)
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted Index `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
.. note::
The Index *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.
Parameters
----------
value : array-like or scalar
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort `self` into ascending
order. They are typically the result of ``np.argsort``.
Returns
-------
int or array of int
A scalar or array of insertion points with the
same shape as `value`.
See Also
--------
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.
Notes
-----
Binary search is used to find the required insertion points.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> ser
0 1
1 2
2 3
dtype: int64
>>> ser.searchsorted(4)
3
>>> ser.searchsorted([0, 4])
array([0, 3])
>>> ser.searchsorted([1, 3], side='left')
array([0, 2])
>>> ser.searchsorted([1, 3], side='right')
array([1, 3])
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
>>> ser.searchsorted('3/14/2000')
3
>>> ser = pd.Categorical(
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']
>>> ser.searchsorted('bread')
1
>>> ser.searchsorted(['bread'], side='right')
array([3])
If the values are not monotonically sorted, wrong locations
may be returned:
>>> ser = pd.Series([2, 1, 3])
>>> ser
0 2
1 1
2 3
dtype: int64
>>> ser.searchsorted(1) # doctest: +SKIP
0 # wrong result, correct would be 1
- to_list = tolist(self)
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- nbytes
- Return the number of bytes in the underlying data.
- ndim
- Number of dimensions of the underlying data, by definition 1.
- size
- Return the number of elements in the underlying data.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __divmod__(self, other)
- __eq__(self, other)
- Return self==value.
- __floordiv__(self, other)
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rdivmod__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class MultiIndex(pandas.core.indexes.base.Index) |
| |
MultiIndex(levels=None, codes=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity: 'bool' = True)
A multi-level, or hierarchical, index object for pandas objects.
Parameters
----------
levels : sequence of arrays
The unique labels for each level.
codes : sequence of arrays
Integers for each level designating which label at each location.
sortorder : optional int
Level of sortedness (must be lexicographically sorted by that
level).
names : optional sequence of objects
Names for each of the index levels. (name is accepted for compat).
copy : bool, default False
Copy the meta-data.
verify_integrity : bool, default True
Check that the levels/codes are consistent and valid.
Attributes
----------
names
levels
codes
nlevels
levshape
Methods
-------
from_arrays
from_tuples
from_product
from_frame
set_levels
set_codes
to_frame
to_flat_index
sortlevel
droplevel
swaplevel
reorder_levels
remove_unused_levels
get_locs
See Also
--------
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
MultiIndex.from_product : Create a MultiIndex from the cartesian product
of iterables.
MultiIndex.from_tuples : Convert list of tuples to a MultiIndex.
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
Index : The base pandas Index type.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html>`__
for more.
Examples
--------
A new ``MultiIndex`` is typically constructed using one of the helper
methods :meth:`MultiIndex.from_arrays`, :meth:`MultiIndex.from_product`
and :meth:`MultiIndex.from_tuples`. For example (using ``.from_arrays``):
>>> arrays = [[1, 1, 2, 2], ['red', 'blue', 'red', 'blue']]
>>> pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
MultiIndex([(1, 'red'),
(1, 'blue'),
(2, 'red'),
(2, 'blue')],
names=['number', 'color'])
See further examples for how to construct a MultiIndex in the doc strings
of the mentioned helper methods. |
| |
- Method resolution order:
- MultiIndex
- pandas.core.indexes.base.Index
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- builtins.object
Methods defined here:
- __abs__(self, other=None)
- __add__(self, other=None)
- __array__(self, dtype=None) -> 'np.ndarray'
- the array interface, return my values
- __contains__(self, key: 'Any') -> 'bool'
- Return a boolean indicating whether the provided key is in the index.
Parameters
----------
key : label
The key to check if it is present in the index.
Returns
-------
bool
Whether the key search is in the index.
Raises
------
TypeError
If the key is not hashable.
See Also
--------
Index.isin : Returns an ndarray of boolean dtype indicating whether the
list-like key is in the index.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> 2 in idx
True
>>> 6 in idx
False
- __divmod__(self, other=None)
- __floordiv__(self, other=None)
- __getitem__(self, key)
- Override numpy.ndarray's __getitem__ method to work as desired.
This function adds lists and Series as valid boolean indexers
(ndarrays only supports ndarray with dtype=bool).
If resulting ndim != 1, plain ndarray is returned instead of
corresponding `Index` subclass.
- __iadd__(self, other=None)
- __invert__(self, other=None)
- __isub__(self, other=None)
- __len__(self) -> 'int'
- Return the length of the Index.
- __mod__(self, other=None)
- __mul__(self, other=None)
- __neg__(self, other=None)
- __pos__(self, other=None)
- __pow__(self, other=None)
- __radd__(self, other=None)
- __rdivmod__(self, other=None)
- __reduce__(self)
- Necessary for making this object picklable
- __rfloordiv__(self, other=None)
- __rmod__(self, other=None)
- __rmul__(self, other=None)
- __rpow__(self, other=None)
- __rsub__(self, other=None)
- __rtruediv__(self, other=None)
- __sub__(self, other=None)
- __truediv__(self, other=None)
- append(self, other)
- Append a collection of Index options together
Parameters
----------
other : Index or list/tuple of indices
Returns
-------
appended : Index
- argsort(self, *args, **kwargs) -> 'npt.NDArray[np.intp]'
- Return the integer indices that would sort the index.
Parameters
----------
*args
Passed to `numpy.ndarray.argsort`.
**kwargs
Passed to `numpy.ndarray.argsort`.
Returns
-------
np.ndarray[np.intp]
Integer indices that would sort the index if used as
an indexer.
See Also
--------
numpy.argsort : Similar method for NumPy arrays.
Index.sort_values : Return sorted copy of Index.
Examples
--------
>>> idx = pd.Index(['b', 'a', 'd', 'c'])
>>> idx
Index(['b', 'a', 'd', 'c'], dtype='object')
>>> order = idx.argsort()
>>> order
array([1, 0, 3, 2])
>>> idx[order]
Index(['a', 'b', 'c', 'd'], dtype='object')
- astype(self, dtype, copy: 'bool' = True)
- Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a TypeError exception is raised.
Parameters
----------
dtype : numpy dtype or pandas type
Note that any signed integer `dtype` is treated as ``'int64'``,
and any unsigned integer `dtype` is treated as ``'uint64'``,
regardless of the size.
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
Returns
-------
Index
Index with values cast to specified dtype.
- copy(self, names=None, dtype=None, levels=None, codes=None, deep=False, name=None)
- Make a copy of this object. Names, dtype, levels and codes can be
passed and will be set on new copy.
Parameters
----------
names : sequence, optional
dtype : numpy dtype or pandas type, optional
.. deprecated:: 1.2.0
levels : sequence, optional
.. deprecated:: 1.2.0
codes : sequence, optional
.. deprecated:: 1.2.0
deep : bool, default False
name : Label
Kept for compatibility with 1-dimensional Index. Should not be used.
Returns
-------
MultiIndex
Notes
-----
In most cases, there should be no functional difference from using
``deep``, but if ``deep`` is passed it will attempt to deepcopy.
This could be potentially expensive on large MultiIndex objects.
- delete(self, loc) -> 'MultiIndex'
- Make new index with passed location deleted
Returns
-------
new_index : MultiIndex
- drop(self, codes, level=None, errors='raise')
- Make new MultiIndex with passed list of codes deleted
Parameters
----------
codes : array-like
Must be a list of tuples when level is not specified
level : int or level name, default None
errors : str, default 'raise'
Returns
-------
dropped : MultiIndex
- drop_duplicates(self, keep: 'str | bool' = 'first') -> 'MultiIndex'
- Return Index with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
Returns
-------
deduplicated : Index
See Also
--------
Series.drop_duplicates : Equivalent method on Series.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Index.duplicated : Related method on Index, indicating duplicate
Index values.
Examples
--------
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The `keep` parameter controls which duplicate values are removed.
The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value 'last' keeps the last occurrence for each set of duplicated
entries.
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value ``False`` discards all sets of duplicated entries.
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
- dropna(self, how: 'str' = 'any') -> 'MultiIndex'
- Return Index without NA/NaN values.
Parameters
----------
how : {'any', 'all'}, default 'any'
If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
Returns
-------
Index
- duplicated(self, keep='first') -> 'npt.NDArray[np.bool_]'
- Indicate duplicate index values.
Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first, or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
The value or values in a set of duplicates to mark as missing.
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
np.ndarray[bool]
See Also
--------
Series.duplicated : Equivalent method on pandas.Series.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Index.drop_duplicates : Remove duplicate values from Index.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set to False and all others to True:
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first')
array([False, False, True, False, True])
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> idx.duplicated(keep='last')
array([ True, False, True, False, False])
By setting keep on ``False``, all duplicates are True:
>>> idx.duplicated(keep=False)
array([ True, False, True, False, True])
- equal_levels(self, other: 'MultiIndex') -> 'bool'
- Return True if the levels of both MultiIndex objects are the same
- equals(self, other: 'object') -> 'bool'
- Determines if two MultiIndex objects have the same labeling information
(the levels themselves do not necessarily have to be the same)
See Also
--------
equal_levels
- fillna(self, value=None, downcast=None)
- fillna is not implemented for MultiIndex
- format(self, name: 'bool | None' = None, formatter: 'Callable | None' = None, na_rep: 'str | None' = None, names: 'bool' = False, space: 'int' = 2, sparsify=None, adjoin: 'bool' = True) -> 'list'
- Render a string representation of the Index.
- get_level_values(self, level)
- Return vector of label values for requested level.
Length of returned vector is equal to the length of the index.
Parameters
----------
level : int or str
``level`` is either the integer position of the level in the
MultiIndex, or the name of the level.
Returns
-------
values : Index
Values is a level of this MultiIndex converted to
a single :class:`Index` (or subclass thereof).
Notes
-----
If the level contains missing values, the result may be casted to
``float`` with missing values specified as ``NaN``. This is because
the level is converted to a regular ``Index``.
Examples
--------
Create a MultiIndex:
>>> mi = pd.MultiIndex.from_arrays((list('abc'), list('def')))
>>> mi.names = ['level_1', 'level_2']
Get level values by supplying level as either integer or name:
>>> mi.get_level_values(0)
Index(['a', 'b', 'c'], dtype='object', name='level_1')
>>> mi.get_level_values('level_2')
Index(['d', 'e', 'f'], dtype='object', name='level_2')
If a level contains missing values, the return type of the level
maybe casted to ``float``.
>>> pd.MultiIndex.from_arrays([[1, None, 2], [3, 4, 5]]).dtypes
level_0 int64
level_1 int64
dtype: object
>>> pd.MultiIndex.from_arrays([[1, None, 2], [3, 4, 5]]).get_level_values(0)
Float64Index([1.0, nan, 2.0], dtype='float64')
- get_loc(self, key, method=None)
- Get location for a label or a tuple of labels.
The location is returned as an integer/slice or boolean
mask.
Parameters
----------
key : label or tuple of labels (one for each level)
method : None
Returns
-------
loc : int, slice object or boolean mask
If the key is past the lexsort depth, the return may be a
boolean mask array, otherwise it is always a slice or int.
See Also
--------
Index.get_loc : The get_loc method for (single-level) index.
MultiIndex.slice_locs : Get slice location given start label(s) and
end label(s).
MultiIndex.get_locs : Get location for a label/slice/list/mask or a
sequence of such.
Notes
-----
The key cannot be a slice, list of same-level labels, a boolean mask,
or a sequence of such. If you want to use those, use
:meth:`MultiIndex.get_locs` instead.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([list('abb'), list('def')])
>>> mi.get_loc('b')
slice(1, 3, None)
>>> mi.get_loc(('b', 'e'))
1
- get_loc_level(self, key, level=0, drop_level: 'bool' = True)
- Get location and sliced index for requested label(s)/level(s).
Parameters
----------
key : label or sequence of labels
level : int/level name or list thereof, optional
drop_level : bool, default True
If ``False``, the resulting index will not drop any level.
Returns
-------
loc : A 2-tuple where the elements are:
Element 0: int, slice object or boolean array
Element 1: The resulting sliced multiindex/index. If the key
contains all levels, this will be ``None``.
See Also
--------
MultiIndex.get_loc : Get location for a label or a tuple of labels.
MultiIndex.get_locs : Get location for a label/slice/list/mask or a
sequence of such.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([list('abb'), list('def')],
... names=['A', 'B'])
>>> mi.get_loc_level('b')
(slice(1, 3, None), Index(['e', 'f'], dtype='object', name='B'))
>>> mi.get_loc_level('e', level='B')
(array([False, True, False]), Index(['b'], dtype='object', name='A'))
>>> mi.get_loc_level(['b', 'e'])
(1, None)
- get_locs(self, seq)
- Get location for a sequence of labels.
Parameters
----------
seq : label, slice, list, mask or a sequence of such
You should use one of the above for each level.
If a level should not be used, set it to ``slice(None)``.
Returns
-------
numpy.ndarray
NumPy array of integers suitable for passing to iloc.
See Also
--------
MultiIndex.get_loc : Get location for a label or a tuple of labels.
MultiIndex.slice_locs : Get slice location given start label(s) and
end label(s).
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([list('abb'), list('def')])
>>> mi.get_locs('b') # doctest: +SKIP
array([1, 2], dtype=int64)
>>> mi.get_locs([slice(None), ['e', 'f']]) # doctest: +SKIP
array([1, 2], dtype=int64)
>>> mi.get_locs([[True, False, True], slice('e', 'f')]) # doctest: +SKIP
array([2], dtype=int64)
- get_slice_bound(self, label: 'Hashable | Sequence[Hashable]', side: 'str', kind=<no_default>) -> 'int'
- For an ordered MultiIndex, compute slice bound
that corresponds to given label.
Returns leftmost (one-past-the-rightmost if `side=='right') position
of given label.
Parameters
----------
label : object or tuple of objects
side : {'left', 'right'}
kind : {'loc', 'getitem', None}
.. deprecated:: 1.4.0
Returns
-------
int
Index of label.
Notes
-----
This method only works if level 0 index of the MultiIndex is lexsorted.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([list('abbc'), list('gefd')])
Get the locations from the leftmost 'b' in the first level
until the end of the multiindex:
>>> mi.get_slice_bound('b', side="left")
1
Like above, but if you get the locations from the rightmost
'b' in the first level and 'f' in the second level:
>>> mi.get_slice_bound(('b','f'), side="right")
3
See Also
--------
MultiIndex.get_loc : Get location for a label or a tuple of labels.
MultiIndex.get_locs : Get location for a label/slice/list/mask or a
sequence of such.
- insert(self, loc: 'int', item) -> 'MultiIndex'
- Make new MultiIndex inserting new item at location
Parameters
----------
loc : int
item : tuple
Must be same length as number of levels in the MultiIndex
Returns
-------
new_index : Index
- is_lexsorted(self) -> 'bool'
- isin(self, values, level=None) -> 'npt.NDArray[np.bool_]'
- Return a boolean array where the index values are in `values`.
Compute boolean array of whether each index value is found in the
passed set of values. The length of the returned boolean array matches
the length of the index.
Parameters
----------
values : set or list-like
Sought values.
level : str or int, optional
Name or position of the index level to use (if the index is a
`MultiIndex`).
Returns
-------
np.ndarray[bool]
NumPy array of boolean values.
See Also
--------
Series.isin : Same for Series.
DataFrame.isin : Same method for DataFrames.
Notes
-----
In the case of `MultiIndex` you must either specify `values` as a
list-like object containing tuples that are the same length as the
number of levels, or specify `level`. Otherwise it will raise a
``ValueError``.
If `level` is specified:
- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.
Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4])
array([ True, False, False])
>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red', 'blue', 'green']],
... names=('number', 'color'))
>>> midx
MultiIndex([(1, 'red'),
(2, 'blue'),
(3, 'green')],
names=['number', 'color'])
Check whether the strings in the 'color' level of the MultiIndex
are in a list of colors.
>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])
To check across the levels of a MultiIndex, pass a list of tuples:
>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
For a DatetimeIndex, string values in `values` are converted to
Timestamps.
>>> dates = ['2000-03-11', '2000-03-12', '2000-03-13']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],
dtype='datetime64[ns]', freq=None)
>>> dti.isin(['2000-03-11'])
array([ True, False, False])
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of the values.
Parameters
----------
deep : bool, default False
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption.
Returns
-------
bytes used
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False or if used on PyPy
- remove_unused_levels(self) -> 'MultiIndex'
- Create new MultiIndex from current that removes unused levels.
Unused level(s) means levels that are not expressed in the
labels. The resulting MultiIndex will have the same outward
appearance, meaning the same .values and ordering. It will
also be .equals() to the original.
Returns
-------
MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_product([range(2), list('ab')])
>>> mi
MultiIndex([(0, 'a'),
(0, 'b'),
(1, 'a'),
(1, 'b')],
)
>>> mi[2:]
MultiIndex([(1, 'a'),
(1, 'b')],
)
The 0 from the first level is not represented
and can be removed
>>> mi2 = mi[2:].remove_unused_levels()
>>> mi2.levels
FrozenList([[1], ['a', 'b']])
- rename = set_names(self, names, level=None, inplace: 'bool' = False) -> 'MultiIndex | None'
- reorder_levels(self, order) -> 'MultiIndex'
- Rearrange levels using input order. May not drop or duplicate levels.
Parameters
----------
order : list of int or list of str
List representing new level order. Reference level by number
(position) or by key (label).
Returns
-------
MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([[1, 2], [3, 4]], names=['x', 'y'])
>>> mi
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.reorder_levels(order=[1, 0])
MultiIndex([(3, 1),
(4, 2)],
names=['y', 'x'])
>>> mi.reorder_levels(order=['y', 'x'])
MultiIndex([(3, 1),
(4, 2)],
names=['y', 'x'])
- repeat(self, repeats: 'int', axis=None) -> 'MultiIndex'
- Repeat elements of a MultiIndex.
Returns a new MultiIndex where each element of the current MultiIndex
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
MultiIndex.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
repeated_index : MultiIndex
Newly created MultiIndex with repeated elements.
See Also
--------
Series.repeat : Equivalent function for Series.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2)
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3])
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
- set_codes(self, codes, level=None, inplace=None, verify_integrity: 'bool' = True)
- Set new codes on MultiIndex. Defaults to returning new index.
Parameters
----------
codes : sequence or list of sequence
New codes to apply.
level : int, level name, or sequence of int/level names (default None)
Level(s) to set (None for all levels).
inplace : bool
If True, mutates in place.
.. deprecated:: 1.2.0
verify_integrity : bool, default True
If True, checks that levels and codes are compatible.
Returns
-------
new index (of same type and class...etc) or None
The same type as the caller or None if ``inplace=True``.
Examples
--------
>>> idx = pd.MultiIndex.from_tuples(
... [(1, "one"), (1, "two"), (2, "one"), (2, "two")], names=["foo", "bar"]
... )
>>> idx
MultiIndex([(1, 'one'),
(1, 'two'),
(2, 'one'),
(2, 'two')],
names=['foo', 'bar'])
>>> idx.set_codes([[1, 0, 1, 0], [0, 0, 1, 1]])
MultiIndex([(2, 'one'),
(1, 'one'),
(2, 'two'),
(1, 'two')],
names=['foo', 'bar'])
>>> idx.set_codes([1, 0, 1, 0], level=0)
MultiIndex([(2, 'one'),
(1, 'two'),
(2, 'one'),
(1, 'two')],
names=['foo', 'bar'])
>>> idx.set_codes([0, 0, 1, 1], level='bar')
MultiIndex([(1, 'one'),
(1, 'one'),
(2, 'two'),
(2, 'two')],
names=['foo', 'bar'])
>>> idx.set_codes([[1, 0, 1, 0], [0, 0, 1, 1]], level=[0, 1])
MultiIndex([(2, 'one'),
(1, 'one'),
(2, 'two'),
(1, 'two')],
names=['foo', 'bar'])
- set_levels(self, levels, level=None, inplace=None, verify_integrity: 'bool' = True)
- Set new levels on MultiIndex. Defaults to returning new index.
Parameters
----------
levels : sequence or list of sequence
New level(s) to apply.
level : int, level name, or sequence of int/level names (default None)
Level(s) to set (None for all levels).
inplace : bool
If True, mutates in place.
.. deprecated:: 1.2.0
verify_integrity : bool, default True
If True, checks that levels and codes are compatible.
Returns
-------
new index (of same type and class...etc) or None
The same type as the caller or None if ``inplace=True``.
Examples
--------
>>> idx = pd.MultiIndex.from_tuples(
... [
... (1, "one"),
... (1, "two"),
... (2, "one"),
... (2, "two"),
... (3, "one"),
... (3, "two")
... ],
... names=["foo", "bar"]
... )
>>> idx
MultiIndex([(1, 'one'),
(1, 'two'),
(2, 'one'),
(2, 'two'),
(3, 'one'),
(3, 'two')],
names=['foo', 'bar'])
>>> idx.set_levels([['a', 'b', 'c'], [1, 2]])
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2),
('c', 1),
('c', 2)],
names=['foo', 'bar'])
>>> idx.set_levels(['a', 'b', 'c'], level=0)
MultiIndex([('a', 'one'),
('a', 'two'),
('b', 'one'),
('b', 'two'),
('c', 'one'),
('c', 'two')],
names=['foo', 'bar'])
>>> idx.set_levels(['a', 'b'], level='bar')
MultiIndex([(1, 'a'),
(1, 'b'),
(2, 'a'),
(2, 'b'),
(3, 'a'),
(3, 'b')],
names=['foo', 'bar'])
If any of the levels passed to ``set_levels()`` exceeds the
existing length, all of the values from that argument will
be stored in the MultiIndex levels, though the values will
be truncated in the MultiIndex output.
>>> idx.set_levels([['a', 'b', 'c'], [1, 2, 3, 4]], level=[0, 1])
MultiIndex([('a', 1),
('a', 2),
('b', 1),
('b', 2),
('c', 1),
('c', 2)],
names=['foo', 'bar'])
>>> idx.set_levels([['a', 'b', 'c'], [1, 2, 3, 4]], level=[0, 1]).levels
FrozenList([['a', 'b', 'c'], [1, 2, 3, 4]])
- set_names(self, names, level=None, inplace: 'bool' = False) -> 'MultiIndex | None'
- Set Index or MultiIndex name.
Able to set new names partially and by level.
Parameters
----------
names : label or list of label or dict-like for MultiIndex
Name(s) to set.
.. versionchanged:: 1.3.0
level : int, label or list of int or label, optional
If the index is a MultiIndex and names is not dict-like, level(s) to set
(None for all levels). Otherwise level must be None.
.. versionchanged:: 1.3.0
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.rename : Able to set new names without level.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
)
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.set_names('species', level=0)
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
When renaming levels with a dict, levels can not be passed.
>>> idx.set_names({'kind': 'snake'})
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['snake', 'year'])
- slice_locs(self, start=None, end=None, step=None, kind=<no_default>) -> 'tuple[int, int]'
- For an ordered MultiIndex, compute the slice locations for input
labels.
The input labels can be tuples representing partial levels, e.g. for a
MultiIndex with 3 levels, you can pass a single value (corresponding to
the first level), or a 1-, 2-, or 3-tuple.
Parameters
----------
start : label or tuple, default None
If None, defaults to the beginning
end : label or tuple
If None, defaults to the end
step : int or None
Slice step
kind : string, optional, defaults None
.. deprecated:: 1.4.0
Returns
-------
(start, end) : (int, int)
Notes
-----
This method only works if the MultiIndex is properly lexsorted. So,
if only the first 2 levels of a 3-level MultiIndex are lexsorted,
you can only pass two levels to ``.slice_locs``.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([list('abbd'), list('deff')],
... names=['A', 'B'])
Get the slice locations from the beginning of 'b' in the first level
until the end of the multiindex:
>>> mi.slice_locs(start='b')
(1, 4)
Like above, but stop at the end of 'b' in the first level and 'f' in
the second level:
>>> mi.slice_locs(start='b', end=('b', 'f'))
(1, 3)
See Also
--------
MultiIndex.get_loc : Get location for a label or a tuple of labels.
MultiIndex.get_locs : Get location for a label/slice/list/mask or a
sequence of such.
- sortlevel(self, level=0, ascending: 'bool' = True, sort_remaining: 'bool' = True) -> 'tuple[MultiIndex, npt.NDArray[np.intp]]'
- Sort MultiIndex at the requested level.
The result will respect the original ordering of the associated
factor at that level.
Parameters
----------
level : list-like, int or str, default 0
If a string is given, must be a name of the level.
If list-like must be names or ints of levels.
ascending : bool, default True
False to sort in descending order.
Can also be a list to specify a directed ordering.
sort_remaining : sort by the remaining levels after level
Returns
-------
sorted_index : pd.MultiIndex
Resulting index.
indexer : np.ndarray[np.intp]
Indices of output values in original index.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([[0, 0], [2, 1]])
>>> mi
MultiIndex([(0, 2),
(0, 1)],
)
>>> mi.sortlevel()
(MultiIndex([(0, 1),
(0, 2)],
), array([1, 0]))
>>> mi.sortlevel(sort_remaining=False)
(MultiIndex([(0, 2),
(0, 1)],
), array([0, 1]))
>>> mi.sortlevel(1)
(MultiIndex([(0, 1),
(0, 2)],
), array([1, 0]))
>>> mi.sortlevel(1, ascending=False)
(MultiIndex([(0, 2),
(0, 1)],
), array([0, 1]))
- swaplevel(self, i=-2, j=-1) -> 'MultiIndex'
- Swap level i with level j.
Calling this method does not change the ordering of the values.
Parameters
----------
i : int, str, default -2
First level of index to be swapped. Can pass level name as string.
Type of parameters can be mixed.
j : int, str, default -1
Second level of index to be swapped. Can pass level name as string.
Type of parameters can be mixed.
Returns
-------
MultiIndex
A new MultiIndex.
See Also
--------
Series.swaplevel : Swap levels i and j in a MultiIndex.
Dataframe.swaplevel : Swap levels i and j in a MultiIndex on a
particular axis.
Examples
--------
>>> mi = pd.MultiIndex(levels=[['a', 'b'], ['bb', 'aa']],
... codes=[[0, 0, 1, 1], [0, 1, 0, 1]])
>>> mi
MultiIndex([('a', 'bb'),
('a', 'aa'),
('b', 'bb'),
('b', 'aa')],
)
>>> mi.swaplevel(0, 1)
MultiIndex([('bb', 'a'),
('aa', 'a'),
('bb', 'b'),
('aa', 'b')],
)
- take(self: 'MultiIndex', indices, axis: 'int' = 0, allow_fill: 'bool' = True, fill_value=None, **kwargs) -> 'MultiIndex'
- Return a new MultiIndex of the values selected by the indices.
For internal compatibility with numpy arrays.
Parameters
----------
indices : array-like
Indices to be taken.
axis : int, optional
The axis over which to select values, always 0.
allow_fill : bool, default True
fill_value : scalar, default None
If allow_fill=True and fill_value is not None, indices specified by
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
Returns
-------
Index
An index formed of elements at the given indices. Will be the same
type as self, except for RangeIndex.
See Also
--------
numpy.ndarray.take: Return an array formed from the
elements of a at the given indices.
- to_flat_index(self) -> 'Index'
- Convert a MultiIndex to an Index of Tuples containing the level values.
Returns
-------
pd.Index
Index with the MultiIndex data represented in Tuples.
See Also
--------
MultiIndex.from_tuples : Convert flat index back to MultiIndex.
Notes
-----
This method will simply return the caller if called by anything other
than a MultiIndex.
Examples
--------
>>> index = pd.MultiIndex.from_product(
... [['foo', 'bar'], ['baz', 'qux']],
... names=['a', 'b'])
>>> index.to_flat_index()
Index([('foo', 'baz'), ('foo', 'qux'),
('bar', 'baz'), ('bar', 'qux')],
dtype='object')
- to_frame(self, index: 'bool' = True, name=<no_default>) -> 'DataFrame'
- Create a DataFrame with the levels of the MultiIndex as columns.
Column ordering is determined by the DataFrame constructor with data as
a dict.
Parameters
----------
index : bool, default True
Set the index of the returned DataFrame as the original MultiIndex.
name : list / sequence of str, optional
The passed names should substitute index level names.
Returns
-------
DataFrame : a DataFrame containing the original MultiIndex data.
See Also
--------
DataFrame : Two-dimensional, size-mutable, potentially heterogeneous
tabular data.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([['a', 'b'], ['c', 'd']])
>>> mi
MultiIndex([('a', 'c'),
('b', 'd')],
)
>>> df = mi.to_frame()
>>> df
0 1
a c a c
b d b d
>>> df = mi.to_frame(index=False)
>>> df
0 1
0 a c
1 b d
>>> df = mi.to_frame(name=['x', 'y'])
>>> df
x y
a c a c
b d b d
- truncate(self, before=None, after=None) -> 'MultiIndex'
- Slice index between two labels / tuples, return new MultiIndex
Parameters
----------
before : label or tuple, can be partial. Default None
None defaults to start
after : label or tuple, can be partial. Default None
None defaults to end
Returns
-------
truncated : MultiIndex
- unique(self, level=None)
- Return unique values in the index.
Unique values are returned in order of appearance, this does NOT sort.
Parameters
----------
level : int or hashable, optional
Only return values from specified level (for MultiIndex).
If int, gets the level by integer position, else by level name.
Returns
-------
Index
See Also
--------
unique : Numpy array of unique values in that column.
Series.unique : Return unique values of Series object.
- view(self, cls=None)
- this is defined as a copy with the same identity
Class methods defined here:
- from_arrays(arrays, sortorder=None, names=<no_default>) -> 'MultiIndex' from builtins.type
- Convert arrays to MultiIndex.
Parameters
----------
arrays : list / sequence of array-likes
Each array-like gives one level's value for each data point.
len(arrays) is the number of levels.
sortorder : int or None
Level of sortedness (must be lexicographically sorted by that
level).
names : list / sequence of str, optional
Names for the levels in the index.
Returns
-------
MultiIndex
See Also
--------
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
MultiIndex.from_product : Make a MultiIndex from cartesian product
of iterables.
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
Examples
--------
>>> arrays = [[1, 1, 2, 2], ['red', 'blue', 'red', 'blue']]
>>> pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
MultiIndex([(1, 'red'),
(1, 'blue'),
(2, 'red'),
(2, 'blue')],
names=['number', 'color'])
- from_frame(df: 'DataFrame', sortorder=None, names=None) -> 'MultiIndex' from builtins.type
- Make a MultiIndex from a DataFrame.
Parameters
----------
df : DataFrame
DataFrame to be converted to MultiIndex.
sortorder : int, optional
Level of sortedness (must be lexicographically sorted by that
level).
names : list-like, optional
If no names are provided, use the column names, or tuple of column
names if the columns is a MultiIndex. If a sequence, overwrite
names with the given sequence.
Returns
-------
MultiIndex
The MultiIndex representation of the given DataFrame.
See Also
--------
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
MultiIndex.from_product : Make a MultiIndex from cartesian product
of iterables.
Examples
--------
>>> df = pd.DataFrame([['HI', 'Temp'], ['HI', 'Precip'],
... ['NJ', 'Temp'], ['NJ', 'Precip']],
... columns=['a', 'b'])
>>> df
a b
0 HI Temp
1 HI Precip
2 NJ Temp
3 NJ Precip
>>> pd.MultiIndex.from_frame(df)
MultiIndex([('HI', 'Temp'),
('HI', 'Precip'),
('NJ', 'Temp'),
('NJ', 'Precip')],
names=['a', 'b'])
Using explicit names, instead of the column names
>>> pd.MultiIndex.from_frame(df, names=['state', 'observation'])
MultiIndex([('HI', 'Temp'),
('HI', 'Precip'),
('NJ', 'Temp'),
('NJ', 'Precip')],
names=['state', 'observation'])
- from_product(iterables, sortorder=None, names=<no_default>) -> 'MultiIndex' from builtins.type
- Make a MultiIndex from the cartesian product of multiple iterables.
Parameters
----------
iterables : list / sequence of iterables
Each iterable has unique labels for each level of the index.
sortorder : int or None
Level of sortedness (must be lexicographically sorted by that
level).
names : list / sequence of str, optional
Names for the levels in the index.
.. versionchanged:: 1.0.0
If not explicitly provided, names will be inferred from the
elements of iterables if an element has a name attribute
Returns
-------
MultiIndex
See Also
--------
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
MultiIndex.from_tuples : Convert list of tuples to MultiIndex.
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
Examples
--------
>>> numbers = [0, 1, 2]
>>> colors = ['green', 'purple']
>>> pd.MultiIndex.from_product([numbers, colors],
... names=['number', 'color'])
MultiIndex([(0, 'green'),
(0, 'purple'),
(1, 'green'),
(1, 'purple'),
(2, 'green'),
(2, 'purple')],
names=['number', 'color'])
- from_tuples(tuples: 'Iterable[tuple[Hashable, ...]]', sortorder: 'int | None' = None, names: 'Sequence[Hashable] | None' = None) -> 'MultiIndex' from builtins.type
- Convert list of tuples to MultiIndex.
Parameters
----------
tuples : list / sequence of tuple-likes
Each tuple is the index of one row/column.
sortorder : int or None
Level of sortedness (must be lexicographically sorted by that
level).
names : list / sequence of str, optional
Names for the levels in the index.
Returns
-------
MultiIndex
See Also
--------
MultiIndex.from_arrays : Convert list of arrays to MultiIndex.
MultiIndex.from_product : Make a MultiIndex from cartesian product
of iterables.
MultiIndex.from_frame : Make a MultiIndex from a DataFrame.
Examples
--------
>>> tuples = [(1, 'red'), (1, 'blue'),
... (2, 'red'), (2, 'blue')]
>>> pd.MultiIndex.from_tuples(tuples, names=('number', 'color'))
MultiIndex([(1, 'red'),
(1, 'blue'),
(2, 'red'),
(2, 'blue')],
names=['number', 'color'])
Static methods defined here:
- __new__(cls, levels=None, codes=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity: 'bool' = True)
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- array
- Raises a ValueError for `MultiIndex` because there's no single
array backing a MultiIndex.
Raises
------
ValueError
- codes
- levshape
- A tuple with the length of each level.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([['a'], ['b'], ['c']])
>>> mi
MultiIndex([('a', 'b', 'c')],
)
>>> mi.levshape
(1, 1, 1)
- lexsort_depth
- nlevels
- Integer number of levels in this MultiIndex.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays([['a'], ['b'], ['c']])
>>> mi
MultiIndex([('a', 'b', 'c')],
)
>>> mi.nlevels
3
- values
- Return an array representing the data in the Index.
.. warning::
We recommend using :attr:`Index.array` or
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
array: numpy.ndarray or ExtensionArray
See Also
--------
Index.array : Reference to the underlying data.
Index.to_numpy : A NumPy array representing the underlying data.
Data descriptors defined here:
- dtype
- dtypes
- Return the dtypes as a Series for the underlying MultiIndex.
- inferred_type
- is_monotonic_decreasing
- return if the index is monotonic decreasing (only equal or
decreasing) values.
- is_monotonic_increasing
- return if the index is monotonic increasing (only equal or
increasing) values.
- levels
- names
- Names of levels in MultiIndex.
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.names
FrozenList(['x', 'y', 'z'])
- nbytes
- return the number of bytes in the underlying data
Data and other attributes defined here:
- __annotations__ = {'sortorder': 'int | None'}
Methods inherited from pandas.core.indexes.base.Index:
- __and__(self, other)
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str_t', *inputs, **kwargs)
- __array_wrap__(self, result, context=None)
- Gets called after a ufunc and other functions e.g. np.split.
- __bool__ = __nonzero__(self)
- __copy__(self: '_IndexT', **kwargs) -> '_IndexT'
- __deepcopy__(self: '_IndexT', memo=None) -> '_IndexT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __nonzero__(self)
- __or__(self, other)
- __repr__(self) -> 'str_t'
- Return a string representation for this object.
- __setitem__(self, key, value)
- __xor__(self, other)
- all(self, *args, **kwargs)
- Return whether all elements are Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
all : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.any : Return whether any element in an Index is True.
Series.any : Return whether any element in a Series is True.
Series.all : Return whether all elements in a Series are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because ``0`` is considered False.
>>> pd.Index([0, 1, 2]).all()
False
- any(self, *args, **kwargs)
- Return whether any element is Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
any : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.all : Return whether all elements are True.
Series.all : Return whether all elements are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
>>> index = pd.Index([0, 1, 2])
>>> index.any()
True
>>> index = pd.Index([0, 0, 0])
>>> index.any()
False
- argmax(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- asof(self, label)
- Return the label from the index, or, if not present, the previous one.
Assuming that the index is sorted, return the passed index label if it
is in the index, or return the previous index label if the passed one
is not in the index.
Parameters
----------
label : object
The label up to which the method returns the latest index label.
Returns
-------
object
The passed label if it is in the index. The previous label if the
passed label is not in the sorted index or `NaN` if there is no
such label.
See Also
--------
Series.asof : Return the latest value in a Series up to the
passed index.
merge_asof : Perform an asof merge (similar to left join but it
matches on nearest key rather than equal key).
Index.get_loc : An `asof` is a thin wrapper around `get_loc`
with method='pad'.
Examples
--------
`Index.asof` returns the latest index label up to the passed label.
>>> idx = pd.Index(['2013-12-31', '2014-01-02', '2014-01-03'])
>>> idx.asof('2014-01-01')
'2013-12-31'
If the label is in the index, the method returns the passed label.
>>> idx.asof('2014-01-02')
'2014-01-02'
If all of the labels in the index are later than the passed label,
NaN is returned.
>>> idx.asof('1999-01-02')
nan
If the index is not sorted, an error is raised.
>>> idx_not_sorted = pd.Index(['2013-12-31', '2015-01-02',
... '2014-01-03'])
>>> idx_not_sorted.asof('2013-12-31')
Traceback (most recent call last):
ValueError: index must be monotonic increasing or decreasing
- asof_locs(self, where: 'Index', mask: 'np.ndarray') -> 'npt.NDArray[np.intp]'
- Return the locations (indices) of labels in the index.
As in the `asof` function, if the label (a particular entry in
`where`) is not in the index, the latest index label up to the
passed label is chosen and its index returned.
If all of the labels in the index are later than a label in `where`,
-1 is returned.
`mask` is used to ignore NA values in the index during calculation.
Parameters
----------
where : Index
An Index consisting of an array of timestamps.
mask : np.ndarray[bool]
Array of booleans denoting where values in the original
data are not NA.
Returns
-------
np.ndarray[np.intp]
An array of locations (indices) of the labels from the Index
which correspond to the return values of the `asof` function
for every element in `where`.
- difference(self, other, sort=None)
- Return a new Index with elements of index not in `other`.
This is the set difference of two Index objects.
Parameters
----------
other : Index or array-like
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
difference : Index
Examples
--------
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
- droplevel(self, level=0)
- Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex.
Parameters
----------
level : int, str, or list-like, default 0
If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
Returns
-------
Index or MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.droplevel()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
>>> mi.droplevel(2)
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel('z')
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel(['x', 'y'])
Int64Index([5, 6], dtype='int64', name='z')
- get_indexer(self, target, method: 'str_t | None' = None, limit: 'int | None' = None, tolerance=None) -> 'npt.NDArray[np.intp]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
Notes
-----
Returns -1 for unmatched values, for further explanation see the
example below.
Examples
--------
>>> index = pd.Index(['c', 'a', 'b'])
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1, 2, -1])
Notice that the return value is an array of locations in ``index``
and ``x`` is marked by -1, as it is not in ``index``.
- get_indexer_for(self, target) -> 'npt.NDArray[np.intp]'
- Guaranteed return of an indexer even when non-unique.
This dispatches to get_indexer or get_indexer_non_unique
as appropriate.
Returns
-------
np.ndarray[np.intp]
List of indices.
Examples
--------
>>> idx = pd.Index([np.nan, 'var1', np.nan])
>>> idx.get_indexer_for([np.nan])
array([0, 2])
- get_indexer_non_unique(self, target) -> 'tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
missing : np.ndarray[np.intp]
An indexer into the target of the values not found.
These correspond to the -1 in the indexer array.
- get_value(self, series: 'Series', key)
- Fast lookup of value from 1-dimensional ndarray.
Only use this if you know what you're doing.
Returns
-------
scalar or Series
- groupby(self, values) -> 'PrettyDict[Hashable, np.ndarray]'
- Group the index labels by a given array of values.
Parameters
----------
values : array
Values used to determine the groups.
Returns
-------
dict
{group name -> group labels}
- holds_integer(self) -> 'bool'
- Whether the type is an integer type.
- identical(self, other) -> 'bool'
- Similar to equals, but checks that object attributes and types are also equal.
Returns
-------
bool
If two Index objects have equal elements and same type True,
otherwise False.
- intersection(self, other, sort=False)
- Form the intersection of two Index objects.
This returns a new Index with elements common to the index and `other`.
Parameters
----------
other : Index or array-like
sort : False or None, default False
Whether to sort the resulting index.
* False : do not sort the result.
* None : sort the result, except when `self` and `other` are equal
or when the values cannot be compared.
Returns
-------
intersection : Index
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
- is_(self, other) -> 'bool'
- More flexible, faster check like ``is`` but that works through views.
Note: this is *not* the same as ``Index.identical()``, which checks
that metadata is also the same.
Parameters
----------
other : object
Other object to compare against.
Returns
-------
bool
True if both have same underlying data, False otherwise.
See Also
--------
Index.identical : Works like ``Index.is_`` but also checks metadata.
- is_boolean(self) -> 'bool'
- Check if the Index only consists of booleans.
Returns
-------
bool
Whether or not the Index only consists of booleans.
See Also
--------
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
- is_categorical(self) -> 'bool'
- Check if the Index holds categorical data.
Returns
-------
bool
True if the Index is categorical.
See Also
--------
CategoricalIndex : Index for categorical data.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = pd.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0 Peter
1 Victor
2 Elisabeth
3 Mar
dtype: object
>>> s.index.is_categorical()
False
- is_floating(self) -> 'bool'
- Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats,
integers, or NaNs.
Returns
-------
bool
Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
- is_integer(self) -> 'bool'
- Check if the Index only consists of integers.
Returns
-------
bool
Whether or not the Index only consists of integers.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
- is_interval(self) -> 'bool'
- Check if the Index holds Interval objects.
Returns
-------
bool
Whether or not the Index holds Interval objects.
See Also
--------
IntervalIndex : Index for Interval objects.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
- is_mixed(self) -> 'bool'
- Check if the Index holds data with mixed data types.
Returns
-------
bool
Whether or not the Index holds data with mixed data types.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
- is_numeric(self) -> 'bool'
- Check if the Index only consists of numeric data.
Returns
-------
bool
Whether or not the Index only consists of numeric data.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
- is_object(self) -> 'bool'
- Check if the Index is of the object dtype.
Returns
-------
bool
Whether or not the Index is of the object dtype.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
- is_type_compatible(self, kind: 'str_t') -> 'bool'
- Whether the index type is compatible with the provided type.
- isna(self) -> 'npt.NDArray[np.bool_]'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`pd.NaT`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values. Characters such as
empty strings `''` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
numpy.ndarray[bool]
A boolean array of whether my values are NA.
See Also
--------
Index.notna : Boolean inverse of isna.
Index.dropna : Omit entries with missing values.
isna : Top-level isna.
Series.isna : Detect missing values in Series object.
Examples
--------
Show which entries in a pandas.Index are NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.isna()
array([False, False, True])
Empty strings are not considered NA values. None is considered an NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.isna()
array([False, False, False, True])
For datetimes, `NaT` (Not a Time) is considered as an NA value.
>>> idx = pd.DatetimeIndex([pd.Timestamp('1940-04-25'),
... pd.Timestamp(''), None, pd.NaT])
>>> idx
DatetimeIndex(['1940-04-25', 'NaT', 'NaT', 'NaT'],
dtype='datetime64[ns]', freq=None)
>>> idx.isna()
array([False, True, True, True])
- isnull = isna(self) -> 'npt.NDArray[np.bool_]'
- join(self, other, how: 'str_t' = 'left', level=None, return_indexers: 'bool' = False, sort: 'bool' = False)
- Compute join_index and indexers to conform data
structures to the new index.
Parameters
----------
other : Index
how : {'left', 'right', 'inner', 'outer'}
level : int or level name, default None
return_indexers : bool, default False
sort : bool, default False
Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword).
Returns
-------
join_index, (left_indexer, right_indexer)
- map(self, mapper, na_action=None)
- Map values using an input mapping or function.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence.
Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.
- max(self, axis=None, skipna=True, *args, **kwargs)
- Return the maximum value of the Index.
Parameters
----------
axis : int, optional
For compatibility with NumPy. Only 0 or None are allowed.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Maximum value.
See Also
--------
Index.min : Return the minimum value in an Index.
Series.max : Return the maximum value in a Series.
DataFrame.max : Return the maximum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.max()
3
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.max()
'c'
For a MultiIndex, the maximum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.max()
('b', 2)
- min(self, axis=None, skipna=True, *args, **kwargs)
- Return the minimum value of the Index.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Minimum value.
See Also
--------
Index.max : Return the maximum value of the object.
Series.min : Return the minimum value in a Series.
DataFrame.min : Return the minimum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.min()
1
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.min()
'a'
For a MultiIndex, the minimum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.min()
('a', 1)
- notna(self) -> 'npt.NDArray[np.bool_]'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.
Returns
-------
numpy.ndarray[bool]
Boolean array to indicate which entries are not NA.
See Also
--------
Index.notnull : Alias of notna.
Index.isna: Inverse of notna.
notna : Top-level notna.
Examples
--------
Show which entries in an Index are not NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.notna()
array([ True, True, False])
Empty strings are not considered NA values. None is considered a NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.notna()
array([ True, True, True, False])
- notnull = notna(self) -> 'npt.NDArray[np.bool_]'
- putmask(self, mask, value) -> 'Index'
- Return a new Index of the values set with the mask.
Returns
-------
Index
See Also
--------
numpy.ndarray.putmask : Changes elements of an array
based on conditional and input values.
- ravel(self, order='C')
- Return an ndarray of the flattened values of the underlying data.
Returns
-------
numpy.ndarray
Flattened array.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> 'tuple[Index, npt.NDArray[np.intp] | None]'
- Create index with target's values.
Parameters
----------
target : an iterable
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
level : int, optional
Level of multiindex.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : int or float, optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
new_index : pd.Index
Resulting index.
indexer : np.ndarray[np.intp] or None
Indices of output values in original index.
Raises
------
TypeError
If ``method`` passed along with ``level``.
ValueError
If non-unique multi-index
ValueError
If non-unique index and ``method`` or ``limit`` passed.
See Also
--------
Series.reindex
DataFrame.reindex
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.reindex(['car', 'bike'])
(Index(['car', 'bike'], dtype='object'), array([0, 1]))
- set_value(self, arr, key, value)
- Fast lookup of value from 1-dimensional ndarray.
.. deprecated:: 1.0
Notes
-----
Only use this if you know what you're doing.
- shift(self, periods=1, freq=None)
- Shift index by desired number of time frequency increments.
This method is for shifting the values of datetime-like indexes
by a specified time increment a given number of times.
Parameters
----------
periods : int, default 1
Number of periods (or increments) to shift by,
can be positive or negative.
freq : pandas.DateOffset, pandas.Timedelta or str, optional
Frequency increment to shift by.
If None, the index is shifted by its own `freq` attribute.
Offset aliases are valid strings, e.g., 'D', 'W', 'M' etc.
Returns
-------
pandas.Index
Shifted index.
See Also
--------
Series.shift : Shift values of Series.
Notes
-----
This method is only implemented for datetime-like index classes,
i.e., DatetimeIndex, PeriodIndex and TimedeltaIndex.
Examples
--------
Put the first 5 month starts of 2011 into an index.
>>> month_starts = pd.date_range('1/1/2011', periods=5, freq='MS')
>>> month_starts
DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01', '2011-04-01',
'2011-05-01'],
dtype='datetime64[ns]', freq='MS')
Shift the index by 10 days.
>>> month_starts.shift(10, freq='D')
DatetimeIndex(['2011-01-11', '2011-02-11', '2011-03-11', '2011-04-11',
'2011-05-11'],
dtype='datetime64[ns]', freq=None)
The default value of `freq` is the `freq` attribute of the index,
which is 'MS' (month start) in this example.
>>> month_starts.shift(10)
DatetimeIndex(['2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
'2012-03-01'],
dtype='datetime64[ns]', freq='MS')
- slice_indexer(self, start: 'Hashable | None' = None, end: 'Hashable | None' = None, step: 'int | None' = None, kind=<no_default>) -> 'slice'
- Compute the slice indexer for input labels and step.
Index needs to be ordered and unique.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, default None
kind : str, default None
.. deprecated:: 1.4.0
Returns
-------
indexer : slice
Raises
------
KeyError : If key does not exist, or key is not unique and index is
not ordered.
Notes
-----
This function assumes that the data is sorted, so use at your own peril
Examples
--------
This is a method on all index types. For example you can do:
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_indexer(start='b', end='c')
slice(1, 3, None)
>>> idx = pd.MultiIndex.from_arrays([list('abcd'), list('efgh')])
>>> idx.slice_indexer(start='b', end=('c', 'g'))
slice(1, 3, None)
- sort(self, *args, **kwargs)
- Use sort_values instead.
- sort_values(self, return_indexer: 'bool' = False, ascending: 'bool' = True, na_position: 'str_t' = 'last', key: 'Callable | None' = None)
- Return a sorted copy of the index.
Return a sorted copy of the index, and optionally return the indices
that sorted the index itself.
Parameters
----------
return_indexer : bool, default False
Should the indices that would sort the index be returned.
ascending : bool, default True
Should the index values be sorted in an ascending order.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
.. versionadded:: 1.2.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
sorted_index : pandas.Index
Sorted copy of the index.
indexer : numpy.ndarray, optional
The indices that the index itself was sorted by.
See Also
--------
Series.sort_values : Sort values of a Series.
DataFrame.sort_values : Sort values in a DataFrame.
Examples
--------
>>> idx = pd.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices `idx` was
sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
- symmetric_difference(self, other, result_name=None, sort=None)
- Compute the symmetric difference of two Index objects.
Parameters
----------
other : Index or array-like
result_name : str
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
symmetric_difference : Index
Notes
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([2, 3, 4, 5])
>>> idx1.symmetric_difference(idx2)
Int64Index([1, 5], dtype='int64')
- to_native_types(self, slicer=None, **kwargs) -> 'np.ndarray'
- Format specified values of `self` and return them.
.. deprecated:: 1.2.0
Parameters
----------
slicer : int, array-like
An indexer into `self` that specifies which values
are used in the formatting process.
kwargs : dict
Options for specifying how the values should be formatted.
These options include the following:
1) na_rep : str
The value that serves as a placeholder for NULL values
2) quoting : bool or None
Whether or not there are quoted values in `self`
3) date_format : str
The format used to represent date-like values.
Returns
-------
numpy.ndarray
Formatted values.
- to_series(self, index=None, name: 'Hashable' = None) -> 'Series'
- Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.
Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.
Returns
-------
Series
The dtype will be based on the type of the Index values.
See Also
--------
Index.to_frame : Convert an Index to a DataFrame.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
By default, the original Index and original name is reused.
>>> idx.to_series()
animal
Ant Ant
Bear Bear
Cow Cow
Name: animal, dtype: object
To enforce a new Index, specify new labels to ``index``:
>>> idx.to_series(index=[0, 1, 2])
0 Ant
1 Bear
2 Cow
Name: animal, dtype: object
To override the name of the resulting column, specify `name`:
>>> idx.to_series(name='zoo')
animal
Ant Ant
Bear Bear
Cow Cow
Name: zoo, dtype: object
- union(self, other, sort=None)
- Form the union of two Index objects.
If the Index objects are incompatible, both Index objects will be
cast to dtype('object') first.
.. versionchanged:: 0.25.0
Parameters
----------
other : Index or array-like
sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.
* False : do not sort the result.
Returns
-------
union : Index
Examples
--------
Union matching dtypes
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union mismatched dtypes
>>> idx1 = pd.Index(['a', 'b', 'c', 'd'])
>>> idx2 = pd.Index([1, 2, 3, 4])
>>> idx1.union(idx2)
Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
MultiIndex case
>>> idx1 = pd.MultiIndex.from_arrays(
... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
... )
>>> idx1
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue')],
)
>>> idx2 = pd.MultiIndex.from_arrays(
... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
... )
>>> idx2
MultiIndex([(3, 'Red'),
(3, 'Green'),
(2, 'Red'),
(2, 'Green')],
)
>>> idx1.union(idx2)
MultiIndex([(1, 'Blue'),
(1, 'Red'),
(2, 'Blue'),
(2, 'Green'),
(2, 'Red'),
(3, 'Green'),
(3, 'Red')],
)
>>> idx1.union(idx2, sort=False)
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue'),
(3, 'Red'),
(3, 'Green'),
(2, 'Green')],
)
- where(self, cond, other=None) -> 'Index'
- Replace values where the condition is False.
The replacement is taken from other.
Parameters
----------
cond : bool array-like with the same length as self
Condition to select the values on.
other : scalar, or array-like, default None
Replacement if the condition is False.
Returns
-------
pandas.Index
A copy of self with values replaced from other
where the condition is False.
See Also
--------
Series.where : Same method for Series.
DataFrame.where : Same method for DataFrame.
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.where(idx.isin(['car', 'train']), 'other')
Index(['car', 'other', 'train', 'other'], dtype='object')
Readonly properties inherited from pandas.core.indexes.base.Index:
- asi8
- Integer representation of the values.
Returns
-------
ndarray
An ndarray with int64 dtype.
- has_duplicates
- Check if the Index has duplicate values.
Returns
-------
bool
Whether or not the Index has duplicate values.
Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
- is_monotonic
- Alias for is_monotonic_increasing.
- shape
- Return a tuple of the shape of the underlying data.
Data descriptors inherited from pandas.core.indexes.base.Index:
- hasnans
- Return True if there are any NaNs.
Enables various performance speedups.
- is_all_dates
- Whether or not the index values only consist of dates.
- is_unique
- Return if the index has unique values.
- name
- Return Index or MultiIndex name.
Data and other attributes inherited from pandas.core.indexes.base.Index:
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1)
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted Index `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
.. note::
The Index *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.
Parameters
----------
value : array-like or scalar
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort `self` into ascending
order. They are typically the result of ``np.argsort``.
Returns
-------
int or array of int
A scalar or array of insertion points with the
same shape as `value`.
See Also
--------
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.
Notes
-----
Binary search is used to find the required insertion points.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> ser
0 1
1 2
2 3
dtype: int64
>>> ser.searchsorted(4)
3
>>> ser.searchsorted([0, 4])
array([0, 3])
>>> ser.searchsorted([1, 3], side='left')
array([0, 2])
>>> ser.searchsorted([1, 3], side='right')
array([1, 3])
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
>>> ser.searchsorted('3/14/2000')
3
>>> ser = pd.Categorical(
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']
>>> ser.searchsorted('bread')
1
>>> ser.searchsorted(['bread'], side='right')
array([3])
If the values are not monotonically sorted, wrong locations
may be returned:
>>> ser = pd.Series([2, 1, 3])
>>> ser
0 2
1 1
2 3
dtype: int64
>>> ser.searchsorted(1) # doctest: +SKIP
0 # wrong result, correct would be 1
- to_list = tolist(self)
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- ndim
- Number of dimensions of the underlying data, by definition 1.
- size
- Return the number of elements in the underlying data.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __eq__(self, other)
- Return self==value.
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __ne__(self, other)
- Return self!=value.
- __rand__(self, other)
- __ror__(self, other)
- __rxor__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class NamedAgg(builtins.tuple) |
| |
NamedAgg(column: 'Hashable', aggfunc: 'AggScalar')
NamedAgg(column, aggfunc) |
| |
- Method resolution order:
- NamedAgg
- builtins.tuple
- builtins.object
Methods defined here:
- __getnewargs__(self)
- Return self as a plain tuple. Used by copy and pickle.
- __repr__(self)
- Return a nicely formatted representation string
- _asdict(self)
- Return a new dict which maps field names to their values.
- _replace(self, /, **kwds)
- Return a new NamedAgg object replacing specified fields with new values
Class methods defined here:
- _make(iterable) from builtins.type
- Make a new NamedAgg object from a sequence or iterable
Static methods defined here:
- __new__(_cls, column: 'Hashable', aggfunc: 'AggScalar')
- Create new instance of NamedAgg(column, aggfunc)
Data descriptors defined here:
- column
- Alias for field number 0
- aggfunc
- Alias for field number 1
Data and other attributes defined here:
- __annotations__ = {'aggfunc': ForwardRef('AggScalar'), 'column': ForwardRef('Hashable')}
- _field_defaults = {}
- _field_types = {'aggfunc': ForwardRef('AggScalar'), 'column': ForwardRef('Hashable')}
- _fields = ('column', 'aggfunc')
- _fields_defaults = {}
Methods inherited from builtins.tuple:
- __add__(self, value, /)
- Return self+value.
- __contains__(self, key, /)
- Return key in self.
- __eq__(self, value, /)
- Return self==value.
- __ge__(self, value, /)
- Return self>=value.
- __getattribute__(self, name, /)
- Return getattr(self, name).
- __getitem__(self, key, /)
- Return self[key].
- __gt__(self, value, /)
- Return self>value.
- __hash__(self, /)
- Return hash(self).
- __iter__(self, /)
- Implement iter(self).
- __le__(self, value, /)
- Return self<=value.
- __len__(self, /)
- Return len(self).
- __lt__(self, value, /)
- Return self<value.
- __mul__(self, value, /)
- Return self*value.
- __ne__(self, value, /)
- Return self!=value.
- __rmul__(self, value, /)
- Return value*self.
- count(self, value, /)
- Return number of occurrences of value.
- index(self, value, start=0, stop=9223372036854775807, /)
- Return first index of value.
Raises ValueError if the value is not present.
|
class Period(_Period) |
| |
Period(value=None, freq=None, ordinal=None, year=None, month=None, quarter=None, day=None, hour=None, minute=None, second=None)
Represents a period of time.
Parameters
----------
value : Period or str, default None
The time period represented (e.g., '4Q2005').
freq : str, default None
One of pandas period strings or corresponding objects.
ordinal : int, default None
The period offset from the proleptic Gregorian epoch.
year : int, default None
Year value of the period.
month : int, default 1
Month value of the period.
quarter : int, default None
Quarter value of the period.
day : int, default 1
Day value of the period.
hour : int, default 0
Hour value of the period.
minute : int, default 0
Minute value of the period.
second : int, default 0
Second value of the period. |
| |
- Method resolution order:
- Period
- _Period
- PeriodMixin
- builtins.object
Methods defined here:
- __new__(cls, value=None, freq=None, ordinal=None, year=None, month=None, quarter=None, day=None, hour=None, minute=None, second=None)
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Methods inherited from _Period:
- __add__(self, value, /)
- Return self+value.
- __eq__(self, value, /)
- Return self==value.
- __ge__(self, value, /)
- Return self>=value.
- __gt__(self, value, /)
- Return self>value.
- __hash__(self, /)
- Return hash(self).
- __le__(self, value, /)
- Return self<=value.
- __lt__(self, value, /)
- Return self<value.
- __ne__(self, value, /)
- Return self!=value.
- __radd__(self, value, /)
- Return value+self.
- __reduce__(...)
- Helper for pickle.
- __repr__(self, /)
- Return repr(self).
- __rsub__(self, value, /)
- Return value-self.
- __setstate__(...)
- __str__(...)
- Return a string representation for a particular DataFrame
- __sub__(self, value, /)
- Return self-value.
- asfreq(...)
- Convert Period to desired frequency, at the start or end of the interval.
Parameters
----------
freq : str
The desired frequency.
how : {'E', 'S', 'end', 'start'}, default 'end'
Start or end of the timespan.
Returns
-------
resampled : Period
- strftime(...)
- Returns the string representation of the :class:`Period`, depending
on the selected ``fmt``. ``fmt`` must be a string
containing one or several directives. The method recognizes the same
directives as the :func:`time.strftime` function of the standard Python
distribution, as well as the specific additional directives ``%f``,
``%F``, ``%q``. (formatting & docs originally from scikits.timeries).
+-----------+--------------------------------+-------+
| Directive | Meaning | Notes |
+===========+================================+=======+
| ``%a`` | Locale's abbreviated weekday | |
| | name. | |
+-----------+--------------------------------+-------+
| ``%A`` | Locale's full weekday name. | |
+-----------+--------------------------------+-------+
| ``%b`` | Locale's abbreviated month | |
| | name. | |
+-----------+--------------------------------+-------+
| ``%B`` | Locale's full month name. | |
+-----------+--------------------------------+-------+
| ``%c`` | Locale's appropriate date and | |
| | time representation. | |
+-----------+--------------------------------+-------+
| ``%d`` | Day of the month as a decimal | |
| | number [01,31]. | |
+-----------+--------------------------------+-------+
| ``%f`` | 'Fiscal' year without a | \(1) |
| | century as a decimal number | |
| | [00,99] | |
+-----------+--------------------------------+-------+
| ``%F`` | 'Fiscal' year with a century | \(2) |
| | as a decimal number | |
+-----------+--------------------------------+-------+
| ``%H`` | Hour (24-hour clock) as a | |
| | decimal number [00,23]. | |
+-----------+--------------------------------+-------+
| ``%I`` | Hour (12-hour clock) as a | |
| | decimal number [01,12]. | |
+-----------+--------------------------------+-------+
| ``%j`` | Day of the year as a decimal | |
| | number [001,366]. | |
+-----------+--------------------------------+-------+
| ``%m`` | Month as a decimal number | |
| | [01,12]. | |
+-----------+--------------------------------+-------+
| ``%M`` | Minute as a decimal number | |
| | [00,59]. | |
+-----------+--------------------------------+-------+
| ``%p`` | Locale's equivalent of either | \(3) |
| | AM or PM. | |
+-----------+--------------------------------+-------+
| ``%q`` | Quarter as a decimal number | |
| | [01,04] | |
+-----------+--------------------------------+-------+
| ``%S`` | Second as a decimal number | \(4) |
| | [00,61]. | |
+-----------+--------------------------------+-------+
| ``%U`` | Week number of the year | \(5) |
| | (Sunday as the first day of | |
| | the week) as a decimal number | |
| | [00,53]. All days in a new | |
| | year preceding the first | |
| | Sunday are considered to be in | |
| | week 0. | |
+-----------+--------------------------------+-------+
| ``%w`` | Weekday as a decimal number | |
| | [0(Sunday),6]. | |
+-----------+--------------------------------+-------+
| ``%W`` | Week number of the year | \(5) |
| | (Monday as the first day of | |
| | the week) as a decimal number | |
| | [00,53]. All days in a new | |
| | year preceding the first | |
| | Monday are considered to be in | |
| | week 0. | |
+-----------+--------------------------------+-------+
| ``%x`` | Locale's appropriate date | |
| | representation. | |
+-----------+--------------------------------+-------+
| ``%X`` | Locale's appropriate time | |
| | representation. | |
+-----------+--------------------------------+-------+
| ``%y`` | Year without century as a | |
| | decimal number [00,99]. | |
+-----------+--------------------------------+-------+
| ``%Y`` | Year with century as a decimal | |
| | number. | |
+-----------+--------------------------------+-------+
| ``%Z`` | Time zone name (no characters | |
| | if no time zone exists). | |
+-----------+--------------------------------+-------+
| ``%%`` | A literal ``'%'`` character. | |
+-----------+--------------------------------+-------+
Notes
-----
(1)
The ``%f`` directive is the same as ``%y`` if the frequency is
not quarterly.
Otherwise, it corresponds to the 'fiscal' year, as defined by
the :attr:`qyear` attribute.
(2)
The ``%F`` directive is the same as ``%Y`` if the frequency is
not quarterly.
Otherwise, it corresponds to the 'fiscal' year, as defined by
the :attr:`qyear` attribute.
(3)
The ``%p`` directive only affects the output hour field
if the ``%I`` directive is used to parse the hour.
(4)
The range really is ``0`` to ``61``; this accounts for leap
seconds and the (very rare) double leap seconds.
(5)
The ``%U`` and ``%W`` directives are only used in calculations
when the day of the week and the year are specified.
Examples
--------
>>> a = Period(freq='Q-JUL', year=2006, quarter=1)
>>> a.strftime('%F-Q%q')
'2006-Q1'
>>> # Output the last month in the quarter of this date
>>> a.strftime('%b-%Y')
'Oct-2005'
>>>
>>> a = Period(freq='D', year=2001, month=1, day=1)
>>> a.strftime('%d-%b-%Y')
'01-Jan-2001'
>>> a.strftime('%b. %d, %Y was a %A')
'Jan. 01, 2001 was a Monday'
- to_timestamp(...)
- Return the Timestamp representation of the Period.
Uses the target frequency specified at the part of the period specified
by `how`, which is either `Start` or `Finish`.
Parameters
----------
freq : str or DateOffset
Target frequency. Default is 'D' if self.freq is week or
longer and 'S' otherwise.
how : str, default 'S' (start)
One of 'S', 'E'. Can be aliased as case insensitive
'Start', 'Finish', 'Begin', 'End'.
Returns
-------
Timestamp
Class methods inherited from _Period:
- now(...) from builtins.type
- Return the period of now's date.
Data descriptors inherited from _Period:
- day
- Get day of the month that a Period falls on.
Returns
-------
int
See Also
--------
Period.dayofweek : Get the day of the week.
Period.dayofyear : Get the day of the year.
Examples
--------
>>> p = pd.Period("2018-03-11", freq='H')
>>> p.day
11
- day_of_week
- Day of the week the period lies in, with Monday=0 and Sunday=6.
If the period frequency is lower than daily (e.g. hourly), and the
period spans over multiple days, the day at the start of the period is
used.
If the frequency is higher than daily (e.g. monthly), the last day
of the period is used.
Returns
-------
int
Day of the week.
See Also
--------
Period.day_of_week : Day of the week the period lies in.
Period.weekday : Alias of Period.day_of_week.
Period.day : Day of the month.
Period.dayofyear : Day of the year.
Examples
--------
>>> per = pd.Period('2017-12-31 22:00', 'H')
>>> per.day_of_week
6
For periods that span over multiple days, the day at the beginning of
the period is returned.
>>> per = pd.Period('2017-12-31 22:00', '4H')
>>> per.day_of_week
6
>>> per.start_time.day_of_week
6
For periods with a frequency higher than days, the last day of the
period is returned.
>>> per = pd.Period('2018-01', 'M')
>>> per.day_of_week
2
>>> per.end_time.day_of_week
2
- day_of_year
- Return the day of the year.
This attribute returns the day of the year on which the particular
date occurs. The return value ranges between 1 to 365 for regular
years and 1 to 366 for leap years.
Returns
-------
int
The day of year.
See Also
--------
Period.day : Return the day of the month.
Period.day_of_week : Return the day of week.
PeriodIndex.day_of_year : Return the day of year of all indexes.
Examples
--------
>>> period = pd.Period("2015-10-23", freq='H')
>>> period.day_of_year
296
>>> period = pd.Period("2012-12-31", freq='D')
>>> period.day_of_year
366
>>> period = pd.Period("2013-01-01", freq='D')
>>> period.day_of_year
1
- dayofweek
- Day of the week the period lies in, with Monday=0 and Sunday=6.
If the period frequency is lower than daily (e.g. hourly), and the
period spans over multiple days, the day at the start of the period is
used.
If the frequency is higher than daily (e.g. monthly), the last day
of the period is used.
Returns
-------
int
Day of the week.
See Also
--------
Period.day_of_week : Day of the week the period lies in.
Period.weekday : Alias of Period.day_of_week.
Period.day : Day of the month.
Period.dayofyear : Day of the year.
Examples
--------
>>> per = pd.Period('2017-12-31 22:00', 'H')
>>> per.day_of_week
6
For periods that span over multiple days, the day at the beginning of
the period is returned.
>>> per = pd.Period('2017-12-31 22:00', '4H')
>>> per.day_of_week
6
>>> per.start_time.day_of_week
6
For periods with a frequency higher than days, the last day of the
period is returned.
>>> per = pd.Period('2018-01', 'M')
>>> per.day_of_week
2
>>> per.end_time.day_of_week
2
- dayofyear
- Return the day of the year.
This attribute returns the day of the year on which the particular
date occurs. The return value ranges between 1 to 365 for regular
years and 1 to 366 for leap years.
Returns
-------
int
The day of year.
See Also
--------
Period.day : Return the day of the month.
Period.day_of_week : Return the day of week.
PeriodIndex.day_of_year : Return the day of year of all indexes.
Examples
--------
>>> period = pd.Period("2015-10-23", freq='H')
>>> period.day_of_year
296
>>> period = pd.Period("2012-12-31", freq='D')
>>> period.day_of_year
366
>>> period = pd.Period("2013-01-01", freq='D')
>>> period.day_of_year
1
- days_in_month
- Get the total number of days in the month that this period falls on.
Returns
-------
int
See Also
--------
Period.daysinmonth : Gets the number of days in the month.
DatetimeIndex.daysinmonth : Gets the number of days in the month.
calendar.monthrange : Returns a tuple containing weekday
(0-6 ~ Mon-Sun) and number of days (28-31).
Examples
--------
>>> p = pd.Period('2018-2-17')
>>> p.days_in_month
28
>>> pd.Period('2018-03-01').days_in_month
31
Handles the leap year case as well:
>>> p = pd.Period('2016-2-17')
>>> p.days_in_month
29
- daysinmonth
- Get the total number of days of the month that the Period falls in.
Returns
-------
int
See Also
--------
Period.days_in_month : Return the days of the month.
Period.dayofyear : Return the day of the year.
Examples
--------
>>> p = pd.Period("2018-03-11", freq='H')
>>> p.daysinmonth
31
- freq
- freqstr
- Return a string representation of the frequency.
- hour
- Get the hour of the day component of the Period.
Returns
-------
int
The hour as an integer, between 0 and 23.
See Also
--------
Period.second : Get the second component of the Period.
Period.minute : Get the minute component of the Period.
Examples
--------
>>> p = pd.Period("2018-03-11 13:03:12.050000")
>>> p.hour
13
Period longer than a day
>>> p = pd.Period("2018-03-11", freq="M")
>>> p.hour
0
- is_leap_year
- Return True if the period's year is in a leap year.
- minute
- Get minute of the hour component of the Period.
Returns
-------
int
The minute as an integer, between 0 and 59.
See Also
--------
Period.hour : Get the hour component of the Period.
Period.second : Get the second component of the Period.
Examples
--------
>>> p = pd.Period("2018-03-11 13:03:12.050000")
>>> p.minute
3
- month
- Return the month this Period falls on.
- ordinal
- quarter
- Return the quarter this Period falls on.
- qyear
- Fiscal year the Period lies in according to its starting-quarter.
The `year` and the `qyear` of the period will be the same if the fiscal
and calendar years are the same. When they are not, the fiscal year
can be different from the calendar year of the period.
Returns
-------
int
The fiscal year of the period.
See Also
--------
Period.year : Return the calendar year of the period.
Examples
--------
If the natural and fiscal year are the same, `qyear` and `year` will
be the same.
>>> per = pd.Period('2018Q1', freq='Q')
>>> per.qyear
2018
>>> per.year
2018
If the fiscal year starts in April (`Q-MAR`), the first quarter of
2018 will start in April 2017. `year` will then be 2018, but `qyear`
will be the fiscal year, 2018.
>>> per = pd.Period('2018Q1', freq='Q-MAR')
>>> per.start_time
Timestamp('2017-04-01 00:00:00')
>>> per.qyear
2018
>>> per.year
2017
- second
- Get the second component of the Period.
Returns
-------
int
The second of the Period (ranges from 0 to 59).
See Also
--------
Period.hour : Get the hour component of the Period.
Period.minute : Get the minute component of the Period.
Examples
--------
>>> p = pd.Period("2018-03-11 13:03:12.050000")
>>> p.second
12
- week
- Get the week of the year on the given Period.
Returns
-------
int
See Also
--------
Period.dayofweek : Get the day component of the Period.
Period.weekday : Get the day component of the Period.
Examples
--------
>>> p = pd.Period("2018-03-11", "H")
>>> p.week
10
>>> p = pd.Period("2018-02-01", "D")
>>> p.week
5
>>> p = pd.Period("2018-01-06", "D")
>>> p.week
1
- weekday
- Day of the week the period lies in, with Monday=0 and Sunday=6.
If the period frequency is lower than daily (e.g. hourly), and the
period spans over multiple days, the day at the start of the period is
used.
If the frequency is higher than daily (e.g. monthly), the last day
of the period is used.
Returns
-------
int
Day of the week.
See Also
--------
Period.dayofweek : Day of the week the period lies in.
Period.weekday : Alias of Period.dayofweek.
Period.day : Day of the month.
Period.dayofyear : Day of the year.
Examples
--------
>>> per = pd.Period('2017-12-31 22:00', 'H')
>>> per.dayofweek
6
For periods that span over multiple days, the day at the beginning of
the period is returned.
>>> per = pd.Period('2017-12-31 22:00', '4H')
>>> per.dayofweek
6
>>> per.start_time.dayofweek
6
For periods with a frequency higher than days, the last day of the
period is returned.
>>> per = pd.Period('2018-01', 'M')
>>> per.dayofweek
2
>>> per.end_time.dayofweek
2
- weekofyear
- Get the week of the year on the given Period.
Returns
-------
int
See Also
--------
Period.dayofweek : Get the day component of the Period.
Period.weekday : Get the day component of the Period.
Examples
--------
>>> p = pd.Period("2018-03-11", "H")
>>> p.weekofyear
10
>>> p = pd.Period("2018-02-01", "D")
>>> p.weekofyear
5
>>> p = pd.Period("2018-01-06", "D")
>>> p.weekofyear
1
- year
- Return the year this Period falls on.
Data and other attributes inherited from _Period:
- __array_priority__ = 100
- __pyx_vtable__ = <capsule object NULL>
Data descriptors inherited from PeriodMixin:
- end_time
- Get the Timestamp for the end of the period.
Returns
-------
Timestamp
See Also
--------
Period.start_time : Return the start Timestamp.
Period.dayofyear : Return the day of year.
Period.daysinmonth : Return the days in that month.
Period.dayofweek : Return the day of the week.
- start_time
- Get the Timestamp for the start of the period.
Returns
-------
Timestamp
See Also
--------
Period.end_time : Return the end Timestamp.
Period.dayofyear : Return the day of year.
Period.daysinmonth : Return the days in that month.
Period.dayofweek : Return the day of the week.
Examples
--------
>>> period = pd.Period('2012-1-1', freq='D')
>>> period
Period('2012-01-01', 'D')
>>> period.start_time
Timestamp('2012-01-01 00:00:00')
>>> period.end_time
Timestamp('2012-01-01 23:59:59.999999999')
|
class PeriodDtype(pandas._libs.tslibs.dtypes.PeriodDtypeBase, PandasExtensionDtype) |
| |
PeriodDtype(freq=None)
An ExtensionDtype for Period data.
**This is not an actual numpy dtype**, but a duck type.
Parameters
----------
freq : str or DateOffset
The frequency of this PeriodDtype.
Attributes
----------
freq
Methods
-------
None
Examples
--------
>>> pd.PeriodDtype(freq='D')
period[D]
>>> pd.PeriodDtype(freq=pd.offsets.MonthEnd())
period[M] |
| |
- Method resolution order:
- PeriodDtype
- pandas._libs.tslibs.dtypes.PeriodDtypeBase
- PandasExtensionDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Methods defined here:
- __eq__(self, other: 'Any') -> 'bool'
- Return self==value.
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'PeriodArray'
- Construct PeriodArray from pyarrow Array/ChunkedArray.
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __reduce__(self)
- Helper for pickle.
- __setstate__(self, state)
- __str__(self) -> 'str_type'
- Return str(self).
Class methods defined here:
- construct_array_type() -> 'type_t[PeriodArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
- construct_from_string(string: 'str_type') -> 'PeriodDtype' from builtins.type
- Strict construction from a string, raise a TypeError if not
possible
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Return a boolean if we if the passed type is an actual dtype that we
can match (via string or type)
Static methods defined here:
- __new__(cls, freq=None)
- Parameters
----------
freq : frequency
Readonly properties defined here:
- freq
- The frequency object of this PeriodDtype.
- na_value
- Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the
user-facing "boxed" version of the NA value, not the physical NA value
for storage. e.g. for JSONArray, this is an empty dictionary.
- name
- A string identifying the data type.
Will be used for display in, e.g. ``Series.dtype``
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- __annotations__ = {'_cache_dtypes': 'dict[str_type, PandasExtensionDtype]', 'kind': 'str_type', 'type': 'type[Period]'}
- base = dtype('O')
- kind = 'O'
- num = 102
- str = '|O08'
- type = <class 'pandas._libs.tslibs.period.Period'>
- Represents a period of time.
Parameters
----------
value : Period or str, default None
The time period represented (e.g., '4Q2005').
freq : str, default None
One of pandas period strings or corresponding objects.
ordinal : int, default None
The period offset from the proleptic Gregorian epoch.
year : int, default None
Year value of the period.
month : int, default 1
Month value of the period.
quarter : int, default None
Quarter value of the period.
day : int, default 1
Day value of the period.
hour : int, default 0
Hour value of the period.
minute : int, default 0
Minute value of the period.
second : int, default 0
Second value of the period.
Methods inherited from pandas._libs.tslibs.dtypes.PeriodDtypeBase:
- __ge__(self, value, /)
- Return self>=value.
- __gt__(self, value, /)
- Return self>value.
- __le__(self, value, /)
- Return self<=value.
- __lt__(self, value, /)
- Return self<value.
Class methods inherited from pandas._libs.tslibs.dtypes.PeriodDtypeBase:
- from_date_offset(...) from builtins.type
Data descriptors inherited from pandas._libs.tslibs.dtypes.PeriodDtypeBase:
- date_offset
- Corresponding DateOffset object.
This mapping is mainly for backward-compatibility.
- freq_group_code
- resolution
Methods inherited from PandasExtensionDtype:
- __getstate__(self) -> 'dict[str_type, Any]'
- __repr__(self) -> 'str_type'
- Return a string representation for a particular object.
Class methods inherited from PandasExtensionDtype:
- reset_cache() -> 'None' from builtins.type
- clear the cache
Data and other attributes inherited from PandasExtensionDtype:
- isbuiltin = 0
- isnative = 0
- itemsize = 8
- shape = ()
- subdtype = None
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
|
class PeriodIndex(pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin) |
| |
PeriodIndex(data=None, ordinal=None, freq=None, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None, **fields) -> 'PeriodIndex'
Immutable ndarray holding ordinal values indicating regular periods in time.
Index keys are boxed to Period objects which carries the metadata (eg,
frequency information).
Parameters
----------
data : array-like (1d int np.ndarray or PeriodArray), optional
Optional period-like data to construct index with.
copy : bool
Make a copy of input ndarray.
freq : str or period object, optional
One of pandas period strings or corresponding objects.
year : int, array, or Series, default None
month : int, array, or Series, default None
quarter : int, array, or Series, default None
day : int, array, or Series, default None
hour : int, array, or Series, default None
minute : int, array, or Series, default None
second : int, array, or Series, default None
dtype : str or PeriodDtype, default None
Attributes
----------
day
dayofweek
day_of_week
dayofyear
day_of_year
days_in_month
daysinmonth
end_time
freq
freqstr
hour
is_leap_year
minute
month
quarter
qyear
second
start_time
week
weekday
weekofyear
year
Methods
-------
asfreq
strftime
to_timestamp
See Also
--------
Index : The base pandas Index type.
Period : Represents a period of time.
DatetimeIndex : Index with datetime64 data.
TimedeltaIndex : Index of timedelta64 data.
period_range : Create a fixed-frequency PeriodIndex.
Examples
--------
>>> idx = pd.PeriodIndex(year=[2000, 2002], quarter=[1, 3])
>>> idx
PeriodIndex(['2000Q1', '2002Q3'], dtype='period[Q-DEC]') |
| |
- Method resolution order:
- PeriodIndex
- pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin
- pandas.core.indexes.extension.NDArrayBackedExtensionIndex
- pandas.core.indexes.extension.ExtensionIndex
- pandas.core.indexes.base.Index
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- builtins.object
Methods defined here:
- asfreq(self, freq=None, how: 'str' = 'E') -> 'PeriodIndex'
- Convert the PeriodArray to the specified frequency `freq`.
Equivalent to applying :meth:`pandas.Period.asfreq` with the given arguments
to each :class:`~pandas.Period` in this PeriodArray.
Parameters
----------
freq : str
A frequency.
how : str {'E', 'S'}, default 'E'
Whether the elements should be aligned to the end
or start within pa period.
* 'E', 'END', or 'FINISH' for end,
* 'S', 'START', or 'BEGIN' for start.
January 31st ('END') vs. January 1st ('START') for example.
Returns
-------
PeriodArray
The transformed PeriodArray with the new frequency.
See Also
--------
pandas.arrays.PeriodArray.asfreq: Convert each Period in a PeriodArray to the given frequency.
Period.asfreq : Convert a :class:`~pandas.Period` object to the given frequency.
Examples
--------
>>> pidx = pd.period_range('2010-01-01', '2015-01-01', freq='A')
>>> pidx
PeriodIndex(['2010', '2011', '2012', '2013', '2014', '2015'],
dtype='period[A-DEC]')
>>> pidx.asfreq('M')
PeriodIndex(['2010-12', '2011-12', '2012-12', '2013-12', '2014-12',
'2015-12'], dtype='period[M]')
>>> pidx.asfreq('M', how='S')
PeriodIndex(['2010-01', '2011-01', '2012-01', '2013-01', '2014-01',
'2015-01'], dtype='period[M]')
- asof_locs(self, where: 'Index', mask: 'np.ndarray') -> 'np.ndarray'
- where : array of timestamps
mask : np.ndarray[bool]
Array of booleans where data is not NA.
- astype(self, dtype, copy: 'bool' = True, how=<no_default>)
- Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a TypeError exception is raised.
Parameters
----------
dtype : numpy dtype or pandas type
Note that any signed integer `dtype` is treated as ``'int64'``,
and any unsigned integer `dtype` is treated as ``'uint64'``,
regardless of the size.
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
Returns
-------
Index
Index with values cast to specified dtype.
- get_loc(self, key, method=None, tolerance=None)
- Get integer location for requested label.
Parameters
----------
key : Period, NaT, str, or datetime
String or datetime key must be parsable as Period.
Returns
-------
loc : int or ndarray[int64]
Raises
------
KeyError
Key is not present in the index.
TypeError
If key is listlike or otherwise not hashable.
- strftime(self, *args, **kwargs)
- Convert to Index using specified date_format.
Return an Index of formatted strings specified by date_format, which
supports the same string format as the python standard library. Details
of the string format can be found in `python string format
doc <https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior>`__.
Parameters
----------
date_format : str
Date format string (e.g. "%Y-%m-%d").
Returns
-------
ndarray[object]
NumPy ndarray of formatted strings.
See Also
--------
to_datetime : Convert the given argument to datetime.
DatetimeIndex.normalize : Return DatetimeIndex with times to midnight.
DatetimeIndex.round : Round the DatetimeIndex to the specified freq.
DatetimeIndex.floor : Floor the DatetimeIndex to the specified freq.
Examples
--------
>>> rng = pd.date_range(pd.Timestamp("2018-03-10 09:00"),
... periods=3, freq='s')
>>> rng.strftime('%B %d, %Y, %r')
Index(['March 10, 2018, 09:00:00 AM', 'March 10, 2018, 09:00:01 AM',
'March 10, 2018, 09:00:02 AM'],
dtype='object')
- to_timestamp(self, freq=None, how='start') -> 'DatetimeIndex'
- Cast to DatetimeArray/Index.
Parameters
----------
freq : str or DateOffset, optional
Target frequency. The default is 'D' for week or longer,
'S' otherwise.
how : {'s', 'e', 'start', 'end'}
Whether to use the start or end of the time period being converted.
Returns
-------
DatetimeArray/Index
Static methods defined here:
- __new__(cls, data=None, ordinal=None, freq=None, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None, **fields) -> 'PeriodIndex'
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- inferred_type
- Return a string of the type inferred from the values.
- is_full
- Returns True if this PeriodIndex is range-like in that all Periods
between start and end are present, in order.
- values
- Return an array representing the data in the Index.
.. warning::
We recommend using :attr:`Index.array` or
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
array: numpy.ndarray or ExtensionArray
See Also
--------
Index.array : Reference to the underlying data.
Index.to_numpy : A NumPy array representing the underlying data.
Data descriptors defined here:
- day
- The days of the period.
- day_of_week
- The day of the week with Monday=0, Sunday=6.
- day_of_year
- The ordinal day of the year.
- dayofweek
- The day of the week with Monday=0, Sunday=6.
- dayofyear
- The ordinal day of the year.
- days_in_month
- The number of days in the month.
- daysinmonth
- The number of days in the month.
- end_time
- hour
- The hour of the period.
- is_leap_year
- Logical indicating if the date belongs to a leap year.
- minute
- The minute of the period.
- month
- The month as January=1, December=12.
- quarter
- The quarter of the date.
- qyear
- second
- The second of the period.
- start_time
- week
- The week ordinal of the year.
- weekday
- The day of the week with Monday=0, Sunday=6.
- weekofyear
- The week ordinal of the year.
- year
- The year of the period.
Data and other attributes defined here:
- __annotations__ = {'_data': 'PeriodArray', 'dtype': 'PeriodDtype', 'freq': 'BaseOffset'}
Methods inherited from pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin:
- __contains__(self, key: 'Any') -> 'bool'
- Return a boolean indicating whether the provided key is in the index.
Parameters
----------
key : label
The key to check if it is present in the index.
Returns
-------
bool
Whether the key search is in the index.
Raises
------
TypeError
If the key is not hashable.
See Also
--------
Index.isin : Returns an ndarray of boolean dtype indicating whether the
list-like key is in the index.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> 2 in idx
True
>>> 6 in idx
False
- equals(self, other: 'Any') -> 'bool'
- Determines if two Index objects contain the same elements.
- format(self, name: 'bool' = False, formatter: 'Callable | None' = None, na_rep: 'str' = 'NaT', date_format: 'str | None' = None) -> 'list[str]'
- Render a string representation of the Index.
- mean(self, *args, **kwargs)
- Return the mean value of the Array.
.. versionadded:: 0.25.0
Parameters
----------
skipna : bool, default True
Whether to ignore any NaT elements.
axis : int, optional, default 0
Returns
-------
scalar
Timestamp or Timedelta.
See Also
--------
numpy.ndarray.mean : Returns the average of array elements along a given axis.
Series.mean : Return the mean value in a Series.
Notes
-----
mean is only defined for Datetime and Timedelta dtypes, not for Period.
- shift(self: '_T', periods: 'int' = 1, freq=None) -> '_T'
- Shift index by desired number of time frequency increments.
This method is for shifting the values of datetime-like indexes
by a specified time increment a given number of times.
Parameters
----------
periods : int, default 1
Number of periods (or increments) to shift by,
can be positive or negative.
freq : pandas.DateOffset, pandas.Timedelta or string, optional
Frequency increment to shift by.
If None, the index is shifted by its own `freq` attribute.
Offset aliases are valid strings, e.g., 'D', 'W', 'M' etc.
Returns
-------
pandas.DatetimeIndex
Shifted index.
See Also
--------
Index.shift : Shift values of Index.
PeriodIndex.shift : Shift values of PeriodIndex.
Data descriptors inherited from pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin:
- asi8
- Integer representation of the values.
Returns
-------
ndarray
An ndarray with int64 dtype.
- freq
- Return the frequency object if it is set, otherwise None.
- freqstr
- Return the frequency object as a string if its set, otherwise None.
- hasnans
- return if I have any nans; enables various perf speedups
- inferred_freq
- Tries to return a string representing a frequency guess,
generated by infer_freq. Returns None if it can't autodetect the
frequency.
- resolution
- Returns day, hour, minute, second, millisecond or microsecond
Methods inherited from pandas.core.indexes.extension.ExtensionIndex:
- map(self, mapper, na_action=None)
- Map values using an input mapping or function.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence.
Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.
Methods inherited from pandas.core.indexes.base.Index:
- __abs__(self)
- __and__(self, other)
- __array__(self, dtype=None) -> 'np.ndarray'
- The array interface, return my values.
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str_t', *inputs, **kwargs)
- __array_wrap__(self, result, context=None)
- Gets called after a ufunc and other functions e.g. np.split.
- __bool__ = __nonzero__(self)
- __copy__(self: '_IndexT', **kwargs) -> '_IndexT'
- __deepcopy__(self: '_IndexT', memo=None) -> '_IndexT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __getitem__(self, key)
- Override numpy.ndarray's __getitem__ method to work as desired.
This function adds lists and Series as valid boolean indexers
(ndarrays only supports ndarray with dtype=bool).
If resulting ndim != 1, plain ndarray is returned instead of
corresponding `Index` subclass.
- __iadd__(self, other)
- __invert__(self)
- __len__(self) -> 'int'
- Return the length of the Index.
- __neg__(self)
- __nonzero__(self)
- __or__(self, other)
- __pos__(self)
- __reduce__(self)
- Helper for pickle.
- __repr__(self) -> 'str_t'
- Return a string representation for this object.
- __setitem__(self, key, value)
- __xor__(self, other)
- all(self, *args, **kwargs)
- Return whether all elements are Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
all : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.any : Return whether any element in an Index is True.
Series.any : Return whether any element in a Series is True.
Series.all : Return whether all elements in a Series are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because ``0`` is considered False.
>>> pd.Index([0, 1, 2]).all()
False
- any(self, *args, **kwargs)
- Return whether any element is Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
any : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.all : Return whether all elements are True.
Series.all : Return whether all elements are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
>>> index = pd.Index([0, 1, 2])
>>> index.any()
True
>>> index = pd.Index([0, 0, 0])
>>> index.any()
False
- append(self, other: 'Index | Sequence[Index]') -> 'Index'
- Append a collection of Index options together.
Parameters
----------
other : Index or list/tuple of indices
Returns
-------
Index
- argmax(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argsort(self, *args, **kwargs) -> 'npt.NDArray[np.intp]'
- Return the integer indices that would sort the index.
Parameters
----------
*args
Passed to `numpy.ndarray.argsort`.
**kwargs
Passed to `numpy.ndarray.argsort`.
Returns
-------
np.ndarray[np.intp]
Integer indices that would sort the index if used as
an indexer.
See Also
--------
numpy.argsort : Similar method for NumPy arrays.
Index.sort_values : Return sorted copy of Index.
Examples
--------
>>> idx = pd.Index(['b', 'a', 'd', 'c'])
>>> idx
Index(['b', 'a', 'd', 'c'], dtype='object')
>>> order = idx.argsort()
>>> order
array([1, 0, 3, 2])
>>> idx[order]
Index(['a', 'b', 'c', 'd'], dtype='object')
- asof(self, label)
- Return the label from the index, or, if not present, the previous one.
Assuming that the index is sorted, return the passed index label if it
is in the index, or return the previous index label if the passed one
is not in the index.
Parameters
----------
label : object
The label up to which the method returns the latest index label.
Returns
-------
object
The passed label if it is in the index. The previous label if the
passed label is not in the sorted index or `NaN` if there is no
such label.
See Also
--------
Series.asof : Return the latest value in a Series up to the
passed index.
merge_asof : Perform an asof merge (similar to left join but it
matches on nearest key rather than equal key).
Index.get_loc : An `asof` is a thin wrapper around `get_loc`
with method='pad'.
Examples
--------
`Index.asof` returns the latest index label up to the passed label.
>>> idx = pd.Index(['2013-12-31', '2014-01-02', '2014-01-03'])
>>> idx.asof('2014-01-01')
'2013-12-31'
If the label is in the index, the method returns the passed label.
>>> idx.asof('2014-01-02')
'2014-01-02'
If all of the labels in the index are later than the passed label,
NaN is returned.
>>> idx.asof('1999-01-02')
nan
If the index is not sorted, an error is raised.
>>> idx_not_sorted = pd.Index(['2013-12-31', '2015-01-02',
... '2014-01-03'])
>>> idx_not_sorted.asof('2013-12-31')
Traceback (most recent call last):
ValueError: index must be monotonic increasing or decreasing
- copy(self: '_IndexT', name: 'Hashable | None' = None, deep: 'bool' = False, dtype: 'Dtype | None' = None, names: 'Sequence[Hashable] | None' = None) -> '_IndexT'
- Make a copy of this object.
Name and dtype sets those attributes on the new object.
Parameters
----------
name : Label, optional
Set name for new object.
deep : bool, default False
dtype : numpy dtype or pandas type, optional
Set dtype for new object.
.. deprecated:: 1.2.0
use ``astype`` method instead.
names : list-like, optional
Kept for compatibility with MultiIndex. Should not be used.
.. deprecated:: 1.4.0
use ``name`` instead.
Returns
-------
Index
Index refer to new object which is a copy of this object.
Notes
-----
In most cases, there should be no functional difference from using
``deep``, but if ``deep`` is passed it will attempt to deepcopy.
- delete(self: '_IndexT', loc) -> '_IndexT'
- Make new Index with passed location(-s) deleted.
Parameters
----------
loc : int or list of int
Location of item(-s) which will be deleted.
Use a list of locations to delete more than one value at the same time.
Returns
-------
Index
Will be same type as self, except for RangeIndex.
See Also
--------
numpy.delete : Delete any rows and column from NumPy array (ndarray).
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete(1)
Index(['a', 'c'], dtype='object')
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete([0, 2])
Index(['b'], dtype='object')
- difference(self, other, sort=None)
- Return a new Index with elements of index not in `other`.
This is the set difference of two Index objects.
Parameters
----------
other : Index or array-like
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
difference : Index
Examples
--------
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
- drop(self, labels, errors: 'str_t' = 'raise') -> 'Index'
- Make new Index with passed list of labels deleted.
Parameters
----------
labels : array-like or scalar
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and existing labels are dropped.
Returns
-------
dropped : Index
Will be same type as self, except for RangeIndex.
Raises
------
KeyError
If not all of the labels are found in the selected axis
- drop_duplicates(self: '_IndexT', keep: 'str_t | bool' = 'first') -> '_IndexT'
- Return Index with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
Returns
-------
deduplicated : Index
See Also
--------
Series.drop_duplicates : Equivalent method on Series.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Index.duplicated : Related method on Index, indicating duplicate
Index values.
Examples
--------
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The `keep` parameter controls which duplicate values are removed.
The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value 'last' keeps the last occurrence for each set of duplicated
entries.
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value ``False`` discards all sets of duplicated entries.
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
- droplevel(self, level=0)
- Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex.
Parameters
----------
level : int, str, or list-like, default 0
If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
Returns
-------
Index or MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.droplevel()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
>>> mi.droplevel(2)
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel('z')
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel(['x', 'y'])
Int64Index([5, 6], dtype='int64', name='z')
- dropna(self: '_IndexT', how: 'str_t' = 'any') -> '_IndexT'
- Return Index without NA/NaN values.
Parameters
----------
how : {'any', 'all'}, default 'any'
If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
Returns
-------
Index
- duplicated(self, keep: "Literal[('first', 'last', False)]" = 'first') -> 'npt.NDArray[np.bool_]'
- Indicate duplicate index values.
Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first, or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
The value or values in a set of duplicates to mark as missing.
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
np.ndarray[bool]
See Also
--------
Series.duplicated : Equivalent method on pandas.Series.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Index.drop_duplicates : Remove duplicate values from Index.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set to False and all others to True:
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first')
array([False, False, True, False, True])
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> idx.duplicated(keep='last')
array([ True, False, True, False, False])
By setting keep on ``False``, all duplicates are True:
>>> idx.duplicated(keep=False)
array([ True, False, True, False, True])
- fillna(self, value=None, downcast=None)
- Fill NA/NaN values with the specified value.
Parameters
----------
value : scalar
Scalar value to use to fill holes (e.g. 0).
This value cannot be a list-likes.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
Index
See Also
--------
DataFrame.fillna : Fill NaN values of a DataFrame.
Series.fillna : Fill NaN Values of a Series.
- get_indexer(self, target, method: 'str_t | None' = None, limit: 'int | None' = None, tolerance=None) -> 'npt.NDArray[np.intp]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
Notes
-----
Returns -1 for unmatched values, for further explanation see the
example below.
Examples
--------
>>> index = pd.Index(['c', 'a', 'b'])
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1, 2, -1])
Notice that the return value is an array of locations in ``index``
and ``x`` is marked by -1, as it is not in ``index``.
- get_indexer_for(self, target) -> 'npt.NDArray[np.intp]'
- Guaranteed return of an indexer even when non-unique.
This dispatches to get_indexer or get_indexer_non_unique
as appropriate.
Returns
-------
np.ndarray[np.intp]
List of indices.
Examples
--------
>>> idx = pd.Index([np.nan, 'var1', np.nan])
>>> idx.get_indexer_for([np.nan])
array([0, 2])
- get_indexer_non_unique(self, target) -> 'tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
missing : np.ndarray[np.intp]
An indexer into the target of the values not found.
These correspond to the -1 in the indexer array.
- get_level_values = _get_level_values(self, level) -> 'Index'
- get_slice_bound(self, label, side: "Literal[('left', 'right')]", kind=<no_default>) -> 'int'
- Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position
of given label.
Parameters
----------
label : object
side : {'left', 'right'}
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
int
Index of label.
- get_value(self, series: 'Series', key)
- Fast lookup of value from 1-dimensional ndarray.
Only use this if you know what you're doing.
Returns
-------
scalar or Series
- groupby(self, values) -> 'PrettyDict[Hashable, np.ndarray]'
- Group the index labels by a given array of values.
Parameters
----------
values : array
Values used to determine the groups.
Returns
-------
dict
{group name -> group labels}
- holds_integer(self) -> 'bool'
- Whether the type is an integer type.
- identical(self, other) -> 'bool'
- Similar to equals, but checks that object attributes and types are also equal.
Returns
-------
bool
If two Index objects have equal elements and same type True,
otherwise False.
- insert(self, loc: 'int', item) -> 'Index'
- Make new Index inserting new item at location.
Follows Python numpy.insert semantics for negative values.
Parameters
----------
loc : int
item : object
Returns
-------
new_index : Index
- intersection(self, other, sort=False)
- Form the intersection of two Index objects.
This returns a new Index with elements common to the index and `other`.
Parameters
----------
other : Index or array-like
sort : False or None, default False
Whether to sort the resulting index.
* False : do not sort the result.
* None : sort the result, except when `self` and `other` are equal
or when the values cannot be compared.
Returns
-------
intersection : Index
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
- is_(self, other) -> 'bool'
- More flexible, faster check like ``is`` but that works through views.
Note: this is *not* the same as ``Index.identical()``, which checks
that metadata is also the same.
Parameters
----------
other : object
Other object to compare against.
Returns
-------
bool
True if both have same underlying data, False otherwise.
See Also
--------
Index.identical : Works like ``Index.is_`` but also checks metadata.
- is_boolean(self) -> 'bool'
- Check if the Index only consists of booleans.
Returns
-------
bool
Whether or not the Index only consists of booleans.
See Also
--------
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
- is_categorical(self) -> 'bool'
- Check if the Index holds categorical data.
Returns
-------
bool
True if the Index is categorical.
See Also
--------
CategoricalIndex : Index for categorical data.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = pd.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0 Peter
1 Victor
2 Elisabeth
3 Mar
dtype: object
>>> s.index.is_categorical()
False
- is_floating(self) -> 'bool'
- Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats,
integers, or NaNs.
Returns
-------
bool
Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
- is_integer(self) -> 'bool'
- Check if the Index only consists of integers.
Returns
-------
bool
Whether or not the Index only consists of integers.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
- is_interval(self) -> 'bool'
- Check if the Index holds Interval objects.
Returns
-------
bool
Whether or not the Index holds Interval objects.
See Also
--------
IntervalIndex : Index for Interval objects.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
- is_mixed(self) -> 'bool'
- Check if the Index holds data with mixed data types.
Returns
-------
bool
Whether or not the Index holds data with mixed data types.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
- is_numeric(self) -> 'bool'
- Check if the Index only consists of numeric data.
Returns
-------
bool
Whether or not the Index only consists of numeric data.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
- is_object(self) -> 'bool'
- Check if the Index is of the object dtype.
Returns
-------
bool
Whether or not the Index is of the object dtype.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
- is_type_compatible(self, kind: 'str_t') -> 'bool'
- Whether the index type is compatible with the provided type.
- isin(self, values, level=None) -> 'np.ndarray'
- Return a boolean array where the index values are in `values`.
Compute boolean array of whether each index value is found in the
passed set of values. The length of the returned boolean array matches
the length of the index.
Parameters
----------
values : set or list-like
Sought values.
level : str or int, optional
Name or position of the index level to use (if the index is a
`MultiIndex`).
Returns
-------
np.ndarray[bool]
NumPy array of boolean values.
See Also
--------
Series.isin : Same for Series.
DataFrame.isin : Same method for DataFrames.
Notes
-----
In the case of `MultiIndex` you must either specify `values` as a
list-like object containing tuples that are the same length as the
number of levels, or specify `level`. Otherwise it will raise a
``ValueError``.
If `level` is specified:
- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.
Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4])
array([ True, False, False])
>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red', 'blue', 'green']],
... names=('number', 'color'))
>>> midx
MultiIndex([(1, 'red'),
(2, 'blue'),
(3, 'green')],
names=['number', 'color'])
Check whether the strings in the 'color' level of the MultiIndex
are in a list of colors.
>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])
To check across the levels of a MultiIndex, pass a list of tuples:
>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
For a DatetimeIndex, string values in `values` are converted to
Timestamps.
>>> dates = ['2000-03-11', '2000-03-12', '2000-03-13']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],
dtype='datetime64[ns]', freq=None)
>>> dti.isin(['2000-03-11'])
array([ True, False, False])
- isna(self) -> 'npt.NDArray[np.bool_]'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`pd.NaT`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values. Characters such as
empty strings `''` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
numpy.ndarray[bool]
A boolean array of whether my values are NA.
See Also
--------
Index.notna : Boolean inverse of isna.
Index.dropna : Omit entries with missing values.
isna : Top-level isna.
Series.isna : Detect missing values in Series object.
Examples
--------
Show which entries in a pandas.Index are NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.isna()
array([False, False, True])
Empty strings are not considered NA values. None is considered an NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.isna()
array([False, False, False, True])
For datetimes, `NaT` (Not a Time) is considered as an NA value.
>>> idx = pd.DatetimeIndex([pd.Timestamp('1940-04-25'),
... pd.Timestamp(''), None, pd.NaT])
>>> idx
DatetimeIndex(['1940-04-25', 'NaT', 'NaT', 'NaT'],
dtype='datetime64[ns]', freq=None)
>>> idx.isna()
array([False, True, True, True])
- isnull = isna(self) -> 'npt.NDArray[np.bool_]'
- join(self, other, how: 'str_t' = 'left', level=None, return_indexers: 'bool' = False, sort: 'bool' = False)
- Compute join_index and indexers to conform data
structures to the new index.
Parameters
----------
other : Index
how : {'left', 'right', 'inner', 'outer'}
level : int or level name, default None
return_indexers : bool, default False
sort : bool, default False
Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword).
Returns
-------
join_index, (left_indexer, right_indexer)
- max(self, axis=None, skipna=True, *args, **kwargs)
- Return the maximum value of the Index.
Parameters
----------
axis : int, optional
For compatibility with NumPy. Only 0 or None are allowed.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Maximum value.
See Also
--------
Index.min : Return the minimum value in an Index.
Series.max : Return the maximum value in a Series.
DataFrame.max : Return the maximum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.max()
3
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.max()
'c'
For a MultiIndex, the maximum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.max()
('b', 2)
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of the values.
Parameters
----------
deep : bool, default False
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption.
Returns
-------
bytes used
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False or if used on PyPy
- min(self, axis=None, skipna=True, *args, **kwargs)
- Return the minimum value of the Index.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Minimum value.
See Also
--------
Index.max : Return the maximum value of the object.
Series.min : Return the minimum value in a Series.
DataFrame.min : Return the minimum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.min()
1
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.min()
'a'
For a MultiIndex, the minimum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.min()
('a', 1)
- notna(self) -> 'npt.NDArray[np.bool_]'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.
Returns
-------
numpy.ndarray[bool]
Boolean array to indicate which entries are not NA.
See Also
--------
Index.notnull : Alias of notna.
Index.isna: Inverse of notna.
notna : Top-level notna.
Examples
--------
Show which entries in an Index are not NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.notna()
array([ True, True, False])
Empty strings are not considered NA values. None is considered a NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.notna()
array([ True, True, True, False])
- notnull = notna(self) -> 'npt.NDArray[np.bool_]'
- putmask(self, mask, value) -> 'Index'
- Return a new Index of the values set with the mask.
Returns
-------
Index
See Also
--------
numpy.ndarray.putmask : Changes elements of an array
based on conditional and input values.
- ravel(self, order='C')
- Return an ndarray of the flattened values of the underlying data.
Returns
-------
numpy.ndarray
Flattened array.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> 'tuple[Index, npt.NDArray[np.intp] | None]'
- Create index with target's values.
Parameters
----------
target : an iterable
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
level : int, optional
Level of multiindex.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : int or float, optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
new_index : pd.Index
Resulting index.
indexer : np.ndarray[np.intp] or None
Indices of output values in original index.
Raises
------
TypeError
If ``method`` passed along with ``level``.
ValueError
If non-unique multi-index
ValueError
If non-unique index and ``method`` or ``limit`` passed.
See Also
--------
Series.reindex
DataFrame.reindex
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.reindex(['car', 'bike'])
(Index(['car', 'bike'], dtype='object'), array([0, 1]))
- rename(self, name, inplace=False)
- Alter Index or MultiIndex name.
Able to set new names without level. Defaults to returning new index.
Length of names must match number of levels in MultiIndex.
Parameters
----------
name : label or list of labels
Name(s) to set.
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.set_names : Able to set new names partially and by level.
Examples
--------
>>> idx = pd.Index(['A', 'C', 'A', 'B'], name='score')
>>> idx.rename('grade')
Index(['A', 'C', 'A', 'B'], dtype='object', name='grade')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]],
... names=['kind', 'year'])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.rename(['species', 'year'])
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
>>> idx.rename('species')
Traceback (most recent call last):
TypeError: Must pass list-like as `names`.
- repeat(self, repeats, axis=None)
- Repeat elements of a Index.
Returns a new Index where each element of the current Index
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Index.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
repeated_index : Index
Newly created Index with repeated elements.
See Also
--------
Series.repeat : Equivalent function for Series.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2)
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3])
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
- set_names(self, names, level=None, inplace: 'bool' = False)
- Set Index or MultiIndex name.
Able to set new names partially and by level.
Parameters
----------
names : label or list of label or dict-like for MultiIndex
Name(s) to set.
.. versionchanged:: 1.3.0
level : int, label or list of int or label, optional
If the index is a MultiIndex and names is not dict-like, level(s) to set
(None for all levels). Otherwise level must be None.
.. versionchanged:: 1.3.0
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.rename : Able to set new names without level.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
)
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.set_names('species', level=0)
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
When renaming levels with a dict, levels can not be passed.
>>> idx.set_names({'kind': 'snake'})
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['snake', 'year'])
- set_value(self, arr, key, value)
- Fast lookup of value from 1-dimensional ndarray.
.. deprecated:: 1.0
Notes
-----
Only use this if you know what you're doing.
- slice_indexer(self, start: 'Hashable | None' = None, end: 'Hashable | None' = None, step: 'int | None' = None, kind=<no_default>) -> 'slice'
- Compute the slice indexer for input labels and step.
Index needs to be ordered and unique.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, default None
kind : str, default None
.. deprecated:: 1.4.0
Returns
-------
indexer : slice
Raises
------
KeyError : If key does not exist, or key is not unique and index is
not ordered.
Notes
-----
This function assumes that the data is sorted, so use at your own peril
Examples
--------
This is a method on all index types. For example you can do:
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_indexer(start='b', end='c')
slice(1, 3, None)
>>> idx = pd.MultiIndex.from_arrays([list('abcd'), list('efgh')])
>>> idx.slice_indexer(start='b', end=('c', 'g'))
slice(1, 3, None)
- slice_locs(self, start=None, end=None, step=None, kind=<no_default>) -> 'tuple[int, int]'
- Compute slice locations for input labels.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, defaults None
If None, defaults to 1.
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
start, end : int
See Also
--------
Index.get_loc : Get location for a single label.
Notes
-----
This method only works if the index is monotonic or unique.
Examples
--------
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_locs(start='b', end='c')
(1, 3)
- sort(self, *args, **kwargs)
- Use sort_values instead.
- sort_values(self, return_indexer: 'bool' = False, ascending: 'bool' = True, na_position: 'str_t' = 'last', key: 'Callable | None' = None)
- Return a sorted copy of the index.
Return a sorted copy of the index, and optionally return the indices
that sorted the index itself.
Parameters
----------
return_indexer : bool, default False
Should the indices that would sort the index be returned.
ascending : bool, default True
Should the index values be sorted in an ascending order.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
.. versionadded:: 1.2.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
sorted_index : pandas.Index
Sorted copy of the index.
indexer : numpy.ndarray, optional
The indices that the index itself was sorted by.
See Also
--------
Series.sort_values : Sort values of a Series.
DataFrame.sort_values : Sort values in a DataFrame.
Examples
--------
>>> idx = pd.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices `idx` was
sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
- sortlevel(self, level=None, ascending=True, sort_remaining=None)
- For internal compatibility with the Index API.
Sort the Index. This is for compat with MultiIndex
Parameters
----------
ascending : bool, default True
False to sort in descending order
level, sort_remaining are compat parameters
Returns
-------
Index
- symmetric_difference(self, other, result_name=None, sort=None)
- Compute the symmetric difference of two Index objects.
Parameters
----------
other : Index or array-like
result_name : str
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
symmetric_difference : Index
Notes
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([2, 3, 4, 5])
>>> idx1.symmetric_difference(idx2)
Int64Index([1, 5], dtype='int64')
- take(self, indices, axis: 'int' = 0, allow_fill: 'bool' = True, fill_value=None, **kwargs)
- Return a new Index of the values selected by the indices.
For internal compatibility with numpy arrays.
Parameters
----------
indices : array-like
Indices to be taken.
axis : int, optional
The axis over which to select values, always 0.
allow_fill : bool, default True
fill_value : scalar, default None
If allow_fill=True and fill_value is not None, indices specified by
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
Returns
-------
Index
An index formed of elements at the given indices. Will be the same
type as self, except for RangeIndex.
See Also
--------
numpy.ndarray.take: Return an array formed from the
elements of a at the given indices.
- to_flat_index(self)
- Identity method.
This is implemented for compatibility with subclass implementations
when chaining.
Returns
-------
pd.Index
Caller.
See Also
--------
MultiIndex.to_flat_index : Subclass implementation.
- to_frame(self, index: 'bool' = True, name: 'Hashable' = <no_default>) -> 'DataFrame'
- Create a DataFrame with a column containing the Index.
Parameters
----------
index : bool, default True
Set the index of the returned DataFrame as the original Index.
name : object, default None
The passed name should substitute for the index name (if it has
one).
Returns
-------
DataFrame
DataFrame containing the original Index data.
See Also
--------
Index.to_series : Convert an Index to a Series.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
animal
animal
Ant Ant
Bear Bear
Cow Cow
By default, the original Index is reused. To enforce a new Index:
>>> idx.to_frame(index=False)
animal
0 Ant
1 Bear
2 Cow
To override the name of the resulting column, specify `name`:
>>> idx.to_frame(index=False, name='zoo')
zoo
0 Ant
1 Bear
2 Cow
- to_native_types(self, slicer=None, **kwargs) -> 'np.ndarray'
- Format specified values of `self` and return them.
.. deprecated:: 1.2.0
Parameters
----------
slicer : int, array-like
An indexer into `self` that specifies which values
are used in the formatting process.
kwargs : dict
Options for specifying how the values should be formatted.
These options include the following:
1) na_rep : str
The value that serves as a placeholder for NULL values
2) quoting : bool or None
Whether or not there are quoted values in `self`
3) date_format : str
The format used to represent date-like values.
Returns
-------
numpy.ndarray
Formatted values.
- to_series(self, index=None, name: 'Hashable' = None) -> 'Series'
- Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.
Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.
Returns
-------
Series
The dtype will be based on the type of the Index values.
See Also
--------
Index.to_frame : Convert an Index to a DataFrame.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
By default, the original Index and original name is reused.
>>> idx.to_series()
animal
Ant Ant
Bear Bear
Cow Cow
Name: animal, dtype: object
To enforce a new Index, specify new labels to ``index``:
>>> idx.to_series(index=[0, 1, 2])
0 Ant
1 Bear
2 Cow
Name: animal, dtype: object
To override the name of the resulting column, specify `name`:
>>> idx.to_series(name='zoo')
animal
Ant Ant
Bear Bear
Cow Cow
Name: zoo, dtype: object
- union(self, other, sort=None)
- Form the union of two Index objects.
If the Index objects are incompatible, both Index objects will be
cast to dtype('object') first.
.. versionchanged:: 0.25.0
Parameters
----------
other : Index or array-like
sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.
* False : do not sort the result.
Returns
-------
union : Index
Examples
--------
Union matching dtypes
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union mismatched dtypes
>>> idx1 = pd.Index(['a', 'b', 'c', 'd'])
>>> idx2 = pd.Index([1, 2, 3, 4])
>>> idx1.union(idx2)
Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
MultiIndex case
>>> idx1 = pd.MultiIndex.from_arrays(
... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
... )
>>> idx1
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue')],
)
>>> idx2 = pd.MultiIndex.from_arrays(
... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
... )
>>> idx2
MultiIndex([(3, 'Red'),
(3, 'Green'),
(2, 'Red'),
(2, 'Green')],
)
>>> idx1.union(idx2)
MultiIndex([(1, 'Blue'),
(1, 'Red'),
(2, 'Blue'),
(2, 'Green'),
(2, 'Red'),
(3, 'Green'),
(3, 'Red')],
)
>>> idx1.union(idx2, sort=False)
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue'),
(3, 'Red'),
(3, 'Green'),
(2, 'Green')],
)
- unique(self: '_IndexT', level: 'Hashable | None' = None) -> '_IndexT'
- Return unique values in the index.
Unique values are returned in order of appearance, this does NOT sort.
Parameters
----------
level : int or hashable, optional
Only return values from specified level (for MultiIndex).
If int, gets the level by integer position, else by level name.
Returns
-------
Index
See Also
--------
unique : Numpy array of unique values in that column.
Series.unique : Return unique values of Series object.
- view(self, cls=None)
- where(self, cond, other=None) -> 'Index'
- Replace values where the condition is False.
The replacement is taken from other.
Parameters
----------
cond : bool array-like with the same length as self
Condition to select the values on.
other : scalar, or array-like, default None
Replacement if the condition is False.
Returns
-------
pandas.Index
A copy of self with values replaced from other
where the condition is False.
See Also
--------
Series.where : Same method for Series.
DataFrame.where : Same method for DataFrame.
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.where(idx.isin(['car', 'train']), 'other')
Index(['car', 'other', 'train', 'other'], dtype='object')
Readonly properties inherited from pandas.core.indexes.base.Index:
- has_duplicates
- Check if the Index has duplicate values.
Returns
-------
bool
Whether or not the Index has duplicate values.
Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
- is_monotonic
- Alias for is_monotonic_increasing.
- is_monotonic_decreasing
- Return if the index is monotonic decreasing (only equal or
decreasing) values.
Examples
--------
>>> Index([3, 2, 1]).is_monotonic_decreasing
True
>>> Index([3, 2, 2]).is_monotonic_decreasing
True
>>> Index([3, 1, 2]).is_monotonic_decreasing
False
- is_monotonic_increasing
- Return if the index is monotonic increasing (only equal or
increasing) values.
Examples
--------
>>> Index([1, 2, 3]).is_monotonic_increasing
True
>>> Index([1, 2, 2]).is_monotonic_increasing
True
>>> Index([1, 3, 2]).is_monotonic_increasing
False
- nlevels
- Number of levels.
- shape
- Return a tuple of the shape of the underlying data.
Data descriptors inherited from pandas.core.indexes.base.Index:
- array
- The ExtensionArray of the data backing this Series or Index.
Returns
-------
ExtensionArray
An ExtensionArray of the values stored within. For extension
types, this is the actual array. For NumPy native types, this
is a thin (no copy) wrapper around :class:`numpy.ndarray`.
``.array`` differs ``.values`` which may require converting the
data to a different form.
See Also
--------
Index.to_numpy : Similar method that always returns a NumPy array.
Series.to_numpy : Similar method that always returns a NumPy array.
Notes
-----
This table lays out the different array types for each extension
dtype within pandas.
================== =============================
dtype array type
================== =============================
category Categorical
period PeriodArray
interval IntervalArray
IntegerNA IntegerArray
string StringArray
boolean BooleanArray
datetime64[ns, tz] DatetimeArray
================== =============================
For any 3rd-party extension types, the array type will be an
ExtensionArray.
For all remaining dtypes ``.array`` will be a
:class:`arrays.NumpyExtensionArray` wrapping the actual ndarray
stored within. If you absolutely need a NumPy array (possibly with
copying / coercing data), then use :meth:`Series.to_numpy` instead.
Examples
--------
For regular NumPy types like int, and float, a PandasArray
is returned.
>>> pd.Series([1, 2, 3]).array
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray
is returned
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
- dtype
- Return the dtype object of the underlying data.
- is_all_dates
- Whether or not the index values only consist of dates.
- is_unique
- Return if the index has unique values.
- name
- Return Index or MultiIndex name.
- names
Data and other attributes inherited from pandas.core.indexes.base.Index:
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1)
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted Index `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
.. note::
The Index *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.
Parameters
----------
value : array-like or scalar
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort `self` into ascending
order. They are typically the result of ``np.argsort``.
Returns
-------
int or array of int
A scalar or array of insertion points with the
same shape as `value`.
See Also
--------
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.
Notes
-----
Binary search is used to find the required insertion points.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> ser
0 1
1 2
2 3
dtype: int64
>>> ser.searchsorted(4)
3
>>> ser.searchsorted([0, 4])
array([0, 3])
>>> ser.searchsorted([1, 3], side='left')
array([0, 2])
>>> ser.searchsorted([1, 3], side='right')
array([1, 3])
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
>>> ser.searchsorted('3/14/2000')
3
>>> ser = pd.Categorical(
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']
>>> ser.searchsorted('bread')
1
>>> ser.searchsorted(['bread'], side='right')
array([3])
If the values are not monotonically sorted, wrong locations
may be returned:
>>> ser = pd.Series([2, 1, 3])
>>> ser
0 2
1 1
2 3
dtype: int64
>>> ser.searchsorted(1) # doctest: +SKIP
0 # wrong result, correct would be 1
- to_list = tolist(self)
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- nbytes
- Return the number of bytes in the underlying data.
- ndim
- Number of dimensions of the underlying data, by definition 1.
- size
- Return the number of elements in the underlying data.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __divmod__(self, other)
- __eq__(self, other)
- Return self==value.
- __floordiv__(self, other)
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rdivmod__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class RangeIndex(pandas.core.indexes.numeric.NumericIndex) |
| |
RangeIndex(start=None, stop=None, step=None, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None) -> 'RangeIndex'
Immutable Index implementing a monotonic integer range.
RangeIndex is a memory-saving special case of Int64Index limited to
representing monotonic ranges. Using RangeIndex may in some instances
improve computing speed.
This is the default index type used
by DataFrame and Series when no explicit index is provided by the user.
Parameters
----------
start : int (default: 0), range, or other RangeIndex instance
If int and "stop" is not given, interpreted as "stop" instead.
stop : int (default: 0)
step : int (default: 1)
dtype : np.int64
Unused, accepted for homogeneity with other index types.
copy : bool, default False
Unused, accepted for homogeneity with other index types.
name : object, optional
Name to be stored in the index.
Attributes
----------
start
stop
step
Methods
-------
from_range
See Also
--------
Index : The base pandas Index type.
Int64Index : Index of int64 data. |
| |
- Method resolution order:
- RangeIndex
- pandas.core.indexes.numeric.NumericIndex
- pandas.core.indexes.base.Index
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- builtins.object
Methods defined here:
- __contains__(self, key: 'Any') -> 'bool'
- Check if key is a float and has a decimal. If it has, return False.
- __floordiv__(self, other)
- __getitem__(self, key)
- Conserve RangeIndex type for scalar and slice keys.
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- __len__(self) -> 'int'
- return the length of the RangeIndex
- __reduce__(self)
- Helper for pickle.
- all(self, *args, **kwargs) -> 'bool'
- Return whether all elements are Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
all : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.any : Return whether any element in an Index is True.
Series.any : Return whether any element in a Series is True.
Series.all : Return whether all elements in a Series are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because ``0`` is considered False.
>>> pd.Index([0, 1, 2]).all()
False
- any(self, *args, **kwargs) -> 'bool'
- Return whether any element is Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
any : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.all : Return whether all elements are True.
Series.all : Return whether all elements are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
>>> index = pd.Index([0, 1, 2])
>>> index.any()
True
>>> index = pd.Index([0, 0, 0])
>>> index.any()
False
- argsort(self, *args, **kwargs) -> 'npt.NDArray[np.intp]'
- Returns the indices that would sort the index and its
underlying data.
Returns
-------
np.ndarray[np.intp]
See Also
--------
numpy.ndarray.argsort
- copy(self, name: 'Hashable' = None, deep: 'bool' = False, dtype: 'Dtype | None' = None, names=None)
- Make a copy of this object.
Name and dtype sets those attributes on the new object.
Parameters
----------
name : Label, optional
Set name for new object.
deep : bool, default False
dtype : numpy dtype or pandas type, optional
Set dtype for new object.
.. deprecated:: 1.2.0
use ``astype`` method instead.
names : list-like, optional
Kept for compatibility with MultiIndex. Should not be used.
.. deprecated:: 1.4.0
use ``name`` instead.
Returns
-------
Index
Index refer to new object which is a copy of this object.
Notes
-----
In most cases, there should be no functional difference from using
``deep``, but if ``deep`` is passed it will attempt to deepcopy.
- delete(self, loc) -> 'Index'
- Make new Index with passed location(-s) deleted.
Parameters
----------
loc : int or list of int
Location of item(-s) which will be deleted.
Use a list of locations to delete more than one value at the same time.
Returns
-------
Index
Will be same type as self, except for RangeIndex.
See Also
--------
numpy.delete : Delete any rows and column from NumPy array (ndarray).
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete(1)
Index(['a', 'c'], dtype='object')
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete([0, 2])
Index(['b'], dtype='object')
- equals(self, other: 'object') -> 'bool'
- Determines if two Index objects contain the same elements.
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1) -> 'tuple[npt.NDArray[np.intp], RangeIndex]'
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- get_loc(self, key, method=None, tolerance=None)
- Get integer location, slice or boolean mask for requested label.
Parameters
----------
key : label
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
tolerance : int or float, optional
Maximum distance from index value for inexact matches. The value of
the index at the matching location must satisfy the equation
``abs(index[loc] - key) <= tolerance``.
Returns
-------
loc : int if unique index, slice if monotonic index, else mask
Examples
--------
>>> unique_index = pd.Index(list('abc'))
>>> unique_index.get_loc('b')
1
>>> monotonic_index = pd.Index(list('abbc'))
>>> monotonic_index.get_loc('b')
slice(1, 3, None)
>>> non_monotonic_index = pd.Index(list('abcb'))
>>> non_monotonic_index.get_loc('b')
array([False, True, False, True])
- insert(self, loc: 'int', item) -> 'Index'
- Make new Index inserting new item at location.
Follows Python numpy.insert semantics for negative values.
Parameters
----------
loc : int
item : object
Returns
-------
new_index : Index
- max(self, axis=None, skipna: 'bool' = True, *args, **kwargs) -> 'int'
- The maximum value of the RangeIndex
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of my values
Parameters
----------
deep : bool
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption
Returns
-------
bytes used
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False
See Also
--------
numpy.ndarray.nbytes
- min(self, axis=None, skipna: 'bool' = True, *args, **kwargs) -> 'int'
- The minimum value of the RangeIndex
- sort_values(self, return_indexer: 'bool' = False, ascending: 'bool' = True, na_position: 'str' = 'last', key: 'Callable | None' = None)
- Return a sorted copy of the index.
Return a sorted copy of the index, and optionally return the indices
that sorted the index itself.
Parameters
----------
return_indexer : bool, default False
Should the indices that would sort the index be returned.
ascending : bool, default True
Should the index values be sorted in an ascending order.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
.. versionadded:: 1.2.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
sorted_index : pandas.Index
Sorted copy of the index.
indexer : numpy.ndarray, optional
The indices that the index itself was sorted by.
See Also
--------
Series.sort_values : Sort values of a Series.
DataFrame.sort_values : Sort values in a DataFrame.
Examples
--------
>>> idx = pd.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices `idx` was
sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
- symmetric_difference(self, other, result_name: 'Hashable' = None, sort=None)
- Compute the symmetric difference of two Index objects.
Parameters
----------
other : Index or array-like
result_name : str
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
symmetric_difference : Index
Notes
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([2, 3, 4, 5])
>>> idx1.symmetric_difference(idx2)
Int64Index([1, 5], dtype='int64')
- tolist(self) -> 'list[int]'
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
Class methods defined here:
- from_range(data: 'range', name=None, dtype: 'Dtype | None' = None) -> 'RangeIndex' from builtins.type
- Create RangeIndex from a range object.
Returns
-------
RangeIndex
Static methods defined here:
- __new__(cls, start=None, stop=None, step=None, dtype: 'Dtype | None' = None, copy: 'bool' = False, name: 'Hashable' = None) -> 'RangeIndex'
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- dtype
- Return the dtype object of the underlying data.
- inferred_type
- Return a string of the type inferred from the values.
- is_unique
- return if the index has unique values
- size
- Return the number of elements in the underlying data.
- start
- The value of the `start` parameter (``0`` if this was not supplied).
- step
- The value of the `step` parameter (``1`` if this was not supplied).
- stop
- The value of the `stop` parameter.
Data descriptors defined here:
- is_monotonic_decreasing
- is_monotonic_increasing
- nbytes
- Return the number of bytes in the underlying data.
Data and other attributes defined here:
- __annotations__ = {'_is_backward_compat_public_numeric_index': 'bool', '_range': 'range'}
Methods inherited from pandas.core.indexes.numeric.NumericIndex:
- astype(self, dtype, copy: 'bool' = True)
- Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a TypeError exception is raised.
Parameters
----------
dtype : numpy dtype or pandas type
Note that any signed integer `dtype` is treated as ``'int64'``,
and any unsigned integer `dtype` is treated as ``'uint64'``,
regardless of the size.
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
Returns
-------
Index
Index with values cast to specified dtype.
Methods inherited from pandas.core.indexes.base.Index:
- __abs__(self)
- __and__(self, other)
- __array__(self, dtype=None) -> 'np.ndarray'
- The array interface, return my values.
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str_t', *inputs, **kwargs)
- __array_wrap__(self, result, context=None)
- Gets called after a ufunc and other functions e.g. np.split.
- __bool__ = __nonzero__(self)
- __copy__(self: '_IndexT', **kwargs) -> '_IndexT'
- __deepcopy__(self: '_IndexT', memo=None) -> '_IndexT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __iadd__(self, other)
- __invert__(self)
- __neg__(self)
- __nonzero__(self)
- __or__(self, other)
- __pos__(self)
- __repr__(self) -> 'str_t'
- Return a string representation for this object.
- __setitem__(self, key, value)
- __xor__(self, other)
- append(self, other: 'Index | Sequence[Index]') -> 'Index'
- Append a collection of Index options together.
Parameters
----------
other : Index or list/tuple of indices
Returns
-------
Index
- argmax(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- asof(self, label)
- Return the label from the index, or, if not present, the previous one.
Assuming that the index is sorted, return the passed index label if it
is in the index, or return the previous index label if the passed one
is not in the index.
Parameters
----------
label : object
The label up to which the method returns the latest index label.
Returns
-------
object
The passed label if it is in the index. The previous label if the
passed label is not in the sorted index or `NaN` if there is no
such label.
See Also
--------
Series.asof : Return the latest value in a Series up to the
passed index.
merge_asof : Perform an asof merge (similar to left join but it
matches on nearest key rather than equal key).
Index.get_loc : An `asof` is a thin wrapper around `get_loc`
with method='pad'.
Examples
--------
`Index.asof` returns the latest index label up to the passed label.
>>> idx = pd.Index(['2013-12-31', '2014-01-02', '2014-01-03'])
>>> idx.asof('2014-01-01')
'2013-12-31'
If the label is in the index, the method returns the passed label.
>>> idx.asof('2014-01-02')
'2014-01-02'
If all of the labels in the index are later than the passed label,
NaN is returned.
>>> idx.asof('1999-01-02')
nan
If the index is not sorted, an error is raised.
>>> idx_not_sorted = pd.Index(['2013-12-31', '2015-01-02',
... '2014-01-03'])
>>> idx_not_sorted.asof('2013-12-31')
Traceback (most recent call last):
ValueError: index must be monotonic increasing or decreasing
- asof_locs(self, where: 'Index', mask: 'np.ndarray') -> 'npt.NDArray[np.intp]'
- Return the locations (indices) of labels in the index.
As in the `asof` function, if the label (a particular entry in
`where`) is not in the index, the latest index label up to the
passed label is chosen and its index returned.
If all of the labels in the index are later than a label in `where`,
-1 is returned.
`mask` is used to ignore NA values in the index during calculation.
Parameters
----------
where : Index
An Index consisting of an array of timestamps.
mask : np.ndarray[bool]
Array of booleans denoting where values in the original
data are not NA.
Returns
-------
np.ndarray[np.intp]
An array of locations (indices) of the labels from the Index
which correspond to the return values of the `asof` function
for every element in `where`.
- difference(self, other, sort=None)
- Return a new Index with elements of index not in `other`.
This is the set difference of two Index objects.
Parameters
----------
other : Index or array-like
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
difference : Index
Examples
--------
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
- drop(self, labels, errors: 'str_t' = 'raise') -> 'Index'
- Make new Index with passed list of labels deleted.
Parameters
----------
labels : array-like or scalar
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and existing labels are dropped.
Returns
-------
dropped : Index
Will be same type as self, except for RangeIndex.
Raises
------
KeyError
If not all of the labels are found in the selected axis
- drop_duplicates(self: '_IndexT', keep: 'str_t | bool' = 'first') -> '_IndexT'
- Return Index with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
Returns
-------
deduplicated : Index
See Also
--------
Series.drop_duplicates : Equivalent method on Series.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Index.duplicated : Related method on Index, indicating duplicate
Index values.
Examples
--------
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The `keep` parameter controls which duplicate values are removed.
The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value 'last' keeps the last occurrence for each set of duplicated
entries.
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value ``False`` discards all sets of duplicated entries.
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
- droplevel(self, level=0)
- Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex.
Parameters
----------
level : int, str, or list-like, default 0
If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
Returns
-------
Index or MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.droplevel()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
>>> mi.droplevel(2)
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel('z')
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel(['x', 'y'])
Int64Index([5, 6], dtype='int64', name='z')
- dropna(self: '_IndexT', how: 'str_t' = 'any') -> '_IndexT'
- Return Index without NA/NaN values.
Parameters
----------
how : {'any', 'all'}, default 'any'
If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
Returns
-------
Index
- duplicated(self, keep: "Literal[('first', 'last', False)]" = 'first') -> 'npt.NDArray[np.bool_]'
- Indicate duplicate index values.
Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first, or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
The value or values in a set of duplicates to mark as missing.
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
np.ndarray[bool]
See Also
--------
Series.duplicated : Equivalent method on pandas.Series.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Index.drop_duplicates : Remove duplicate values from Index.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set to False and all others to True:
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first')
array([False, False, True, False, True])
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> idx.duplicated(keep='last')
array([ True, False, True, False, False])
By setting keep on ``False``, all duplicates are True:
>>> idx.duplicated(keep=False)
array([ True, False, True, False, True])
- fillna(self, value=None, downcast=None)
- Fill NA/NaN values with the specified value.
Parameters
----------
value : scalar
Scalar value to use to fill holes (e.g. 0).
This value cannot be a list-likes.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
Index
See Also
--------
DataFrame.fillna : Fill NaN values of a DataFrame.
Series.fillna : Fill NaN Values of a Series.
- format(self, name: 'bool' = False, formatter: 'Callable | None' = None, na_rep: 'str_t' = 'NaN') -> 'list[str_t]'
- Render a string representation of the Index.
- get_indexer(self, target, method: 'str_t | None' = None, limit: 'int | None' = None, tolerance=None) -> 'npt.NDArray[np.intp]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
Notes
-----
Returns -1 for unmatched values, for further explanation see the
example below.
Examples
--------
>>> index = pd.Index(['c', 'a', 'b'])
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1, 2, -1])
Notice that the return value is an array of locations in ``index``
and ``x`` is marked by -1, as it is not in ``index``.
- get_indexer_for(self, target) -> 'npt.NDArray[np.intp]'
- Guaranteed return of an indexer even when non-unique.
This dispatches to get_indexer or get_indexer_non_unique
as appropriate.
Returns
-------
np.ndarray[np.intp]
List of indices.
Examples
--------
>>> idx = pd.Index([np.nan, 'var1', np.nan])
>>> idx.get_indexer_for([np.nan])
array([0, 2])
- get_indexer_non_unique(self, target) -> 'tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
missing : np.ndarray[np.intp]
An indexer into the target of the values not found.
These correspond to the -1 in the indexer array.
- get_level_values = _get_level_values(self, level) -> 'Index'
- get_slice_bound(self, label, side: "Literal[('left', 'right')]", kind=<no_default>) -> 'int'
- Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position
of given label.
Parameters
----------
label : object
side : {'left', 'right'}
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
int
Index of label.
- get_value(self, series: 'Series', key)
- Fast lookup of value from 1-dimensional ndarray.
Only use this if you know what you're doing.
Returns
-------
scalar or Series
- groupby(self, values) -> 'PrettyDict[Hashable, np.ndarray]'
- Group the index labels by a given array of values.
Parameters
----------
values : array
Values used to determine the groups.
Returns
-------
dict
{group name -> group labels}
- holds_integer(self) -> 'bool'
- Whether the type is an integer type.
- identical(self, other) -> 'bool'
- Similar to equals, but checks that object attributes and types are also equal.
Returns
-------
bool
If two Index objects have equal elements and same type True,
otherwise False.
- intersection(self, other, sort=False)
- Form the intersection of two Index objects.
This returns a new Index with elements common to the index and `other`.
Parameters
----------
other : Index or array-like
sort : False or None, default False
Whether to sort the resulting index.
* False : do not sort the result.
* None : sort the result, except when `self` and `other` are equal
or when the values cannot be compared.
Returns
-------
intersection : Index
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
- is_(self, other) -> 'bool'
- More flexible, faster check like ``is`` but that works through views.
Note: this is *not* the same as ``Index.identical()``, which checks
that metadata is also the same.
Parameters
----------
other : object
Other object to compare against.
Returns
-------
bool
True if both have same underlying data, False otherwise.
See Also
--------
Index.identical : Works like ``Index.is_`` but also checks metadata.
- is_boolean(self) -> 'bool'
- Check if the Index only consists of booleans.
Returns
-------
bool
Whether or not the Index only consists of booleans.
See Also
--------
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
- is_categorical(self) -> 'bool'
- Check if the Index holds categorical data.
Returns
-------
bool
True if the Index is categorical.
See Also
--------
CategoricalIndex : Index for categorical data.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = pd.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0 Peter
1 Victor
2 Elisabeth
3 Mar
dtype: object
>>> s.index.is_categorical()
False
- is_floating(self) -> 'bool'
- Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats,
integers, or NaNs.
Returns
-------
bool
Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
- is_integer(self) -> 'bool'
- Check if the Index only consists of integers.
Returns
-------
bool
Whether or not the Index only consists of integers.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
- is_interval(self) -> 'bool'
- Check if the Index holds Interval objects.
Returns
-------
bool
Whether or not the Index holds Interval objects.
See Also
--------
IntervalIndex : Index for Interval objects.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
- is_mixed(self) -> 'bool'
- Check if the Index holds data with mixed data types.
Returns
-------
bool
Whether or not the Index holds data with mixed data types.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
- is_numeric(self) -> 'bool'
- Check if the Index only consists of numeric data.
Returns
-------
bool
Whether or not the Index only consists of numeric data.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
- is_object(self) -> 'bool'
- Check if the Index is of the object dtype.
Returns
-------
bool
Whether or not the Index is of the object dtype.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
- is_type_compatible(self, kind: 'str_t') -> 'bool'
- Whether the index type is compatible with the provided type.
- isin(self, values, level=None) -> 'np.ndarray'
- Return a boolean array where the index values are in `values`.
Compute boolean array of whether each index value is found in the
passed set of values. The length of the returned boolean array matches
the length of the index.
Parameters
----------
values : set or list-like
Sought values.
level : str or int, optional
Name or position of the index level to use (if the index is a
`MultiIndex`).
Returns
-------
np.ndarray[bool]
NumPy array of boolean values.
See Also
--------
Series.isin : Same for Series.
DataFrame.isin : Same method for DataFrames.
Notes
-----
In the case of `MultiIndex` you must either specify `values` as a
list-like object containing tuples that are the same length as the
number of levels, or specify `level`. Otherwise it will raise a
``ValueError``.
If `level` is specified:
- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.
Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4])
array([ True, False, False])
>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red', 'blue', 'green']],
... names=('number', 'color'))
>>> midx
MultiIndex([(1, 'red'),
(2, 'blue'),
(3, 'green')],
names=['number', 'color'])
Check whether the strings in the 'color' level of the MultiIndex
are in a list of colors.
>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])
To check across the levels of a MultiIndex, pass a list of tuples:
>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
For a DatetimeIndex, string values in `values` are converted to
Timestamps.
>>> dates = ['2000-03-11', '2000-03-12', '2000-03-13']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],
dtype='datetime64[ns]', freq=None)
>>> dti.isin(['2000-03-11'])
array([ True, False, False])
- isna(self) -> 'npt.NDArray[np.bool_]'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`pd.NaT`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values. Characters such as
empty strings `''` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
numpy.ndarray[bool]
A boolean array of whether my values are NA.
See Also
--------
Index.notna : Boolean inverse of isna.
Index.dropna : Omit entries with missing values.
isna : Top-level isna.
Series.isna : Detect missing values in Series object.
Examples
--------
Show which entries in a pandas.Index are NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.isna()
array([False, False, True])
Empty strings are not considered NA values. None is considered an NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.isna()
array([False, False, False, True])
For datetimes, `NaT` (Not a Time) is considered as an NA value.
>>> idx = pd.DatetimeIndex([pd.Timestamp('1940-04-25'),
... pd.Timestamp(''), None, pd.NaT])
>>> idx
DatetimeIndex(['1940-04-25', 'NaT', 'NaT', 'NaT'],
dtype='datetime64[ns]', freq=None)
>>> idx.isna()
array([False, True, True, True])
- isnull = isna(self) -> 'npt.NDArray[np.bool_]'
- join(self, other, how: 'str_t' = 'left', level=None, return_indexers: 'bool' = False, sort: 'bool' = False)
- Compute join_index and indexers to conform data
structures to the new index.
Parameters
----------
other : Index
how : {'left', 'right', 'inner', 'outer'}
level : int or level name, default None
return_indexers : bool, default False
sort : bool, default False
Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword).
Returns
-------
join_index, (left_indexer, right_indexer)
- map(self, mapper, na_action=None)
- Map values using an input mapping or function.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence.
Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.
- notna(self) -> 'npt.NDArray[np.bool_]'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.
Returns
-------
numpy.ndarray[bool]
Boolean array to indicate which entries are not NA.
See Also
--------
Index.notnull : Alias of notna.
Index.isna: Inverse of notna.
notna : Top-level notna.
Examples
--------
Show which entries in an Index are not NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.notna()
array([ True, True, False])
Empty strings are not considered NA values. None is considered a NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.notna()
array([ True, True, True, False])
- notnull = notna(self) -> 'npt.NDArray[np.bool_]'
- putmask(self, mask, value) -> 'Index'
- Return a new Index of the values set with the mask.
Returns
-------
Index
See Also
--------
numpy.ndarray.putmask : Changes elements of an array
based on conditional and input values.
- ravel(self, order='C')
- Return an ndarray of the flattened values of the underlying data.
Returns
-------
numpy.ndarray
Flattened array.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> 'tuple[Index, npt.NDArray[np.intp] | None]'
- Create index with target's values.
Parameters
----------
target : an iterable
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
level : int, optional
Level of multiindex.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : int or float, optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
new_index : pd.Index
Resulting index.
indexer : np.ndarray[np.intp] or None
Indices of output values in original index.
Raises
------
TypeError
If ``method`` passed along with ``level``.
ValueError
If non-unique multi-index
ValueError
If non-unique index and ``method`` or ``limit`` passed.
See Also
--------
Series.reindex
DataFrame.reindex
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.reindex(['car', 'bike'])
(Index(['car', 'bike'], dtype='object'), array([0, 1]))
- rename(self, name, inplace=False)
- Alter Index or MultiIndex name.
Able to set new names without level. Defaults to returning new index.
Length of names must match number of levels in MultiIndex.
Parameters
----------
name : label or list of labels
Name(s) to set.
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.set_names : Able to set new names partially and by level.
Examples
--------
>>> idx = pd.Index(['A', 'C', 'A', 'B'], name='score')
>>> idx.rename('grade')
Index(['A', 'C', 'A', 'B'], dtype='object', name='grade')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]],
... names=['kind', 'year'])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.rename(['species', 'year'])
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
>>> idx.rename('species')
Traceback (most recent call last):
TypeError: Must pass list-like as `names`.
- repeat(self, repeats, axis=None)
- Repeat elements of a Index.
Returns a new Index where each element of the current Index
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Index.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
repeated_index : Index
Newly created Index with repeated elements.
See Also
--------
Series.repeat : Equivalent function for Series.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2)
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3])
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
- set_names(self, names, level=None, inplace: 'bool' = False)
- Set Index or MultiIndex name.
Able to set new names partially and by level.
Parameters
----------
names : label or list of label or dict-like for MultiIndex
Name(s) to set.
.. versionchanged:: 1.3.0
level : int, label or list of int or label, optional
If the index is a MultiIndex and names is not dict-like, level(s) to set
(None for all levels). Otherwise level must be None.
.. versionchanged:: 1.3.0
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.rename : Able to set new names without level.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
)
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.set_names('species', level=0)
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
When renaming levels with a dict, levels can not be passed.
>>> idx.set_names({'kind': 'snake'})
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['snake', 'year'])
- set_value(self, arr, key, value)
- Fast lookup of value from 1-dimensional ndarray.
.. deprecated:: 1.0
Notes
-----
Only use this if you know what you're doing.
- shift(self, periods=1, freq=None)
- Shift index by desired number of time frequency increments.
This method is for shifting the values of datetime-like indexes
by a specified time increment a given number of times.
Parameters
----------
periods : int, default 1
Number of periods (or increments) to shift by,
can be positive or negative.
freq : pandas.DateOffset, pandas.Timedelta or str, optional
Frequency increment to shift by.
If None, the index is shifted by its own `freq` attribute.
Offset aliases are valid strings, e.g., 'D', 'W', 'M' etc.
Returns
-------
pandas.Index
Shifted index.
See Also
--------
Series.shift : Shift values of Series.
Notes
-----
This method is only implemented for datetime-like index classes,
i.e., DatetimeIndex, PeriodIndex and TimedeltaIndex.
Examples
--------
Put the first 5 month starts of 2011 into an index.
>>> month_starts = pd.date_range('1/1/2011', periods=5, freq='MS')
>>> month_starts
DatetimeIndex(['2011-01-01', '2011-02-01', '2011-03-01', '2011-04-01',
'2011-05-01'],
dtype='datetime64[ns]', freq='MS')
Shift the index by 10 days.
>>> month_starts.shift(10, freq='D')
DatetimeIndex(['2011-01-11', '2011-02-11', '2011-03-11', '2011-04-11',
'2011-05-11'],
dtype='datetime64[ns]', freq=None)
The default value of `freq` is the `freq` attribute of the index,
which is 'MS' (month start) in this example.
>>> month_starts.shift(10)
DatetimeIndex(['2011-11-01', '2011-12-01', '2012-01-01', '2012-02-01',
'2012-03-01'],
dtype='datetime64[ns]', freq='MS')
- slice_indexer(self, start: 'Hashable | None' = None, end: 'Hashable | None' = None, step: 'int | None' = None, kind=<no_default>) -> 'slice'
- Compute the slice indexer for input labels and step.
Index needs to be ordered and unique.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, default None
kind : str, default None
.. deprecated:: 1.4.0
Returns
-------
indexer : slice
Raises
------
KeyError : If key does not exist, or key is not unique and index is
not ordered.
Notes
-----
This function assumes that the data is sorted, so use at your own peril
Examples
--------
This is a method on all index types. For example you can do:
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_indexer(start='b', end='c')
slice(1, 3, None)
>>> idx = pd.MultiIndex.from_arrays([list('abcd'), list('efgh')])
>>> idx.slice_indexer(start='b', end=('c', 'g'))
slice(1, 3, None)
- slice_locs(self, start=None, end=None, step=None, kind=<no_default>) -> 'tuple[int, int]'
- Compute slice locations for input labels.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, defaults None
If None, defaults to 1.
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
start, end : int
See Also
--------
Index.get_loc : Get location for a single label.
Notes
-----
This method only works if the index is monotonic or unique.
Examples
--------
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_locs(start='b', end='c')
(1, 3)
- sort(self, *args, **kwargs)
- Use sort_values instead.
- sortlevel(self, level=None, ascending=True, sort_remaining=None)
- For internal compatibility with the Index API.
Sort the Index. This is for compat with MultiIndex
Parameters
----------
ascending : bool, default True
False to sort in descending order
level, sort_remaining are compat parameters
Returns
-------
Index
- take(self, indices, axis: 'int' = 0, allow_fill: 'bool' = True, fill_value=None, **kwargs)
- Return a new Index of the values selected by the indices.
For internal compatibility with numpy arrays.
Parameters
----------
indices : array-like
Indices to be taken.
axis : int, optional
The axis over which to select values, always 0.
allow_fill : bool, default True
fill_value : scalar, default None
If allow_fill=True and fill_value is not None, indices specified by
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
Returns
-------
Index
An index formed of elements at the given indices. Will be the same
type as self, except for RangeIndex.
See Also
--------
numpy.ndarray.take: Return an array formed from the
elements of a at the given indices.
- to_flat_index(self)
- Identity method.
This is implemented for compatibility with subclass implementations
when chaining.
Returns
-------
pd.Index
Caller.
See Also
--------
MultiIndex.to_flat_index : Subclass implementation.
- to_frame(self, index: 'bool' = True, name: 'Hashable' = <no_default>) -> 'DataFrame'
- Create a DataFrame with a column containing the Index.
Parameters
----------
index : bool, default True
Set the index of the returned DataFrame as the original Index.
name : object, default None
The passed name should substitute for the index name (if it has
one).
Returns
-------
DataFrame
DataFrame containing the original Index data.
See Also
--------
Index.to_series : Convert an Index to a Series.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
animal
animal
Ant Ant
Bear Bear
Cow Cow
By default, the original Index is reused. To enforce a new Index:
>>> idx.to_frame(index=False)
animal
0 Ant
1 Bear
2 Cow
To override the name of the resulting column, specify `name`:
>>> idx.to_frame(index=False, name='zoo')
zoo
0 Ant
1 Bear
2 Cow
- to_native_types(self, slicer=None, **kwargs) -> 'np.ndarray'
- Format specified values of `self` and return them.
.. deprecated:: 1.2.0
Parameters
----------
slicer : int, array-like
An indexer into `self` that specifies which values
are used in the formatting process.
kwargs : dict
Options for specifying how the values should be formatted.
These options include the following:
1) na_rep : str
The value that serves as a placeholder for NULL values
2) quoting : bool or None
Whether or not there are quoted values in `self`
3) date_format : str
The format used to represent date-like values.
Returns
-------
numpy.ndarray
Formatted values.
- to_series(self, index=None, name: 'Hashable' = None) -> 'Series'
- Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.
Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.
Returns
-------
Series
The dtype will be based on the type of the Index values.
See Also
--------
Index.to_frame : Convert an Index to a DataFrame.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
By default, the original Index and original name is reused.
>>> idx.to_series()
animal
Ant Ant
Bear Bear
Cow Cow
Name: animal, dtype: object
To enforce a new Index, specify new labels to ``index``:
>>> idx.to_series(index=[0, 1, 2])
0 Ant
1 Bear
2 Cow
Name: animal, dtype: object
To override the name of the resulting column, specify `name`:
>>> idx.to_series(name='zoo')
animal
Ant Ant
Bear Bear
Cow Cow
Name: zoo, dtype: object
- union(self, other, sort=None)
- Form the union of two Index objects.
If the Index objects are incompatible, both Index objects will be
cast to dtype('object') first.
.. versionchanged:: 0.25.0
Parameters
----------
other : Index or array-like
sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.
* False : do not sort the result.
Returns
-------
union : Index
Examples
--------
Union matching dtypes
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union mismatched dtypes
>>> idx1 = pd.Index(['a', 'b', 'c', 'd'])
>>> idx2 = pd.Index([1, 2, 3, 4])
>>> idx1.union(idx2)
Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
MultiIndex case
>>> idx1 = pd.MultiIndex.from_arrays(
... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
... )
>>> idx1
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue')],
)
>>> idx2 = pd.MultiIndex.from_arrays(
... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
... )
>>> idx2
MultiIndex([(3, 'Red'),
(3, 'Green'),
(2, 'Red'),
(2, 'Green')],
)
>>> idx1.union(idx2)
MultiIndex([(1, 'Blue'),
(1, 'Red'),
(2, 'Blue'),
(2, 'Green'),
(2, 'Red'),
(3, 'Green'),
(3, 'Red')],
)
>>> idx1.union(idx2, sort=False)
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue'),
(3, 'Red'),
(3, 'Green'),
(2, 'Green')],
)
- unique(self: '_IndexT', level: 'Hashable | None' = None) -> '_IndexT'
- Return unique values in the index.
Unique values are returned in order of appearance, this does NOT sort.
Parameters
----------
level : int or hashable, optional
Only return values from specified level (for MultiIndex).
If int, gets the level by integer position, else by level name.
Returns
-------
Index
See Also
--------
unique : Numpy array of unique values in that column.
Series.unique : Return unique values of Series object.
- view(self, cls=None)
- where(self, cond, other=None) -> 'Index'
- Replace values where the condition is False.
The replacement is taken from other.
Parameters
----------
cond : bool array-like with the same length as self
Condition to select the values on.
other : scalar, or array-like, default None
Replacement if the condition is False.
Returns
-------
pandas.Index
A copy of self with values replaced from other
where the condition is False.
See Also
--------
Series.where : Same method for Series.
DataFrame.where : Same method for DataFrame.
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.where(idx.isin(['car', 'train']), 'other')
Index(['car', 'other', 'train', 'other'], dtype='object')
Readonly properties inherited from pandas.core.indexes.base.Index:
- asi8
- Integer representation of the values.
Returns
-------
ndarray
An ndarray with int64 dtype.
- has_duplicates
- Check if the Index has duplicate values.
Returns
-------
bool
Whether or not the Index has duplicate values.
Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
- is_monotonic
- Alias for is_monotonic_increasing.
- nlevels
- Number of levels.
- shape
- Return a tuple of the shape of the underlying data.
- values
- Return an array representing the data in the Index.
.. warning::
We recommend using :attr:`Index.array` or
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
array: numpy.ndarray or ExtensionArray
See Also
--------
Index.array : Reference to the underlying data.
Index.to_numpy : A NumPy array representing the underlying data.
Data descriptors inherited from pandas.core.indexes.base.Index:
- array
- The ExtensionArray of the data backing this Series or Index.
Returns
-------
ExtensionArray
An ExtensionArray of the values stored within. For extension
types, this is the actual array. For NumPy native types, this
is a thin (no copy) wrapper around :class:`numpy.ndarray`.
``.array`` differs ``.values`` which may require converting the
data to a different form.
See Also
--------
Index.to_numpy : Similar method that always returns a NumPy array.
Series.to_numpy : Similar method that always returns a NumPy array.
Notes
-----
This table lays out the different array types for each extension
dtype within pandas.
================== =============================
dtype array type
================== =============================
category Categorical
period PeriodArray
interval IntervalArray
IntegerNA IntegerArray
string StringArray
boolean BooleanArray
datetime64[ns, tz] DatetimeArray
================== =============================
For any 3rd-party extension types, the array type will be an
ExtensionArray.
For all remaining dtypes ``.array`` will be a
:class:`arrays.NumpyExtensionArray` wrapping the actual ndarray
stored within. If you absolutely need a NumPy array (possibly with
copying / coercing data), then use :meth:`Series.to_numpy` instead.
Examples
--------
For regular NumPy types like int, and float, a PandasArray
is returned.
>>> pd.Series([1, 2, 3]).array
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray
is returned
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
- hasnans
- Return True if there are any NaNs.
Enables various performance speedups.
- is_all_dates
- Whether or not the index values only consist of dates.
- name
- Return Index or MultiIndex name.
- names
Data and other attributes inherited from pandas.core.indexes.base.Index:
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted Index `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
.. note::
The Index *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.
Parameters
----------
value : array-like or scalar
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort `self` into ascending
order. They are typically the result of ``np.argsort``.
Returns
-------
int or array of int
A scalar or array of insertion points with the
same shape as `value`.
See Also
--------
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.
Notes
-----
Binary search is used to find the required insertion points.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> ser
0 1
1 2
2 3
dtype: int64
>>> ser.searchsorted(4)
3
>>> ser.searchsorted([0, 4])
array([0, 3])
>>> ser.searchsorted([1, 3], side='left')
array([0, 2])
>>> ser.searchsorted([1, 3], side='right')
array([1, 3])
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
>>> ser.searchsorted('3/14/2000')
3
>>> ser = pd.Categorical(
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']
>>> ser.searchsorted('bread')
1
>>> ser.searchsorted(['bread'], side='right')
array([3])
If the values are not monotonically sorted, wrong locations
may be returned:
>>> ser = pd.Series([2, 1, 3])
>>> ser
0 2
1 1
2 3
dtype: int64
>>> ser.searchsorted(1) # doctest: +SKIP
0 # wrong result, correct would be 1
- to_list = tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- ndim
- Number of dimensions of the underlying data, by definition 1.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __divmod__(self, other)
- __eq__(self, other)
- Return self==value.
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rdivmod__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class Series(pandas.core.base.IndexOpsMixin, pandas.core.generic.NDFrame) |
| |
Series(data=None, index=None, dtype: 'Dtype | None' = None, name=None, copy: 'bool' = False, fastpath: 'bool' = False)
One-dimensional ndarray with axis labels (including time series).
Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
methods for performing operations involving the index. Statistical
methods from ndarray have been overridden to automatically exclude
missing data (currently represented as NaN).
Operations between Series (+, -, /, \*, \*\*) align values based on their
associated index values-- they need not be the same length. The result
index will be the sorted union of the two indexes.
Parameters
----------
data : array-like, Iterable, dict, or scalar value
Contains data stored in Series. If data is a dict, argument order is
maintained.
index : array-like or Index (1d)
Values must be hashable and have the same length as `data`.
Non-unique index values are allowed. Will default to
RangeIndex (0, 1, 2, ..., n) if not provided. If data is dict-like
and index is None, then the keys in the data are used as the index. If the
index is not None, the resulting Series is reindexed with the index values.
dtype : str, numpy.dtype, or ExtensionDtype, optional
Data type for the output Series. If not specified, this will be
inferred from `data`.
See the :ref:`user guide <basics.dtypes>` for more usages.
name : str, optional
The name to give to the Series.
copy : bool, default False
Copy input data. Only affects Series or 1d ndarray input. See examples.
Examples
--------
Constructing Series from a dictionary with an Index specified
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['a', 'b', 'c'])
>>> ser
a 1
b 2
c 3
dtype: int64
The keys of the dictionary match with the Index values, hence the Index
values have no effect.
>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> ser = pd.Series(data=d, index=['x', 'y', 'z'])
>>> ser
x NaN
y NaN
z NaN
dtype: float64
Note that the Index is first build with the keys from the dictionary.
After this the Series is reindexed with the given Index values, hence we
get all NaN as a result.
Constructing Series from a list with `copy=False`.
>>> r = [1, 2]
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
[1, 2]
>>> ser
0 999
1 2
dtype: int64
Due to input data type the Series has a `copy` of
the original data even though `copy=False`, so
the data is unchanged.
Constructing Series from a 1d ndarray with `copy=False`.
>>> r = np.array([1, 2])
>>> ser = pd.Series(r, copy=False)
>>> ser.iloc[0] = 999
>>> r
array([999, 2])
>>> ser
0 999
1 2
dtype: int64
Due to input data type the Series has a `view` on
the original data, so
the data is changed as well. |
| |
- Method resolution order:
- Series
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.generic.NDFrame
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- pandas.core.indexing.IndexingMixin
- builtins.object
Methods defined here:
- __array__(self, dtype: 'npt.DTypeLike | None' = None) -> 'np.ndarray'
- Return the values as a NumPy array.
Users should not call this directly. Rather, it is invoked by
:func:`numpy.array` and :func:`numpy.asarray`.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to use for the resulting NumPy array. By default,
the dtype is inferred from the data.
Returns
-------
numpy.ndarray
The values in the series converted to a :class:`numpy.ndarray`
with the specified `dtype`.
See Also
--------
array : Create a new array from data.
Series.array : Zero-copy view to the array backing the Series.
Series.to_numpy : Series method for similar behavior.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> np.asarray(ser)
array([1, 2, 3])
For timezone-aware data, the timezones may be retained with
``dtype='object'``
>>> tzser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> np.asarray(tzser, dtype="object")
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or the values may be localized to UTC and the tzinfo discarded with
``dtype='datetime64[ns]'``
>>> np.asarray(tzser, dtype="datetime64[ns]") # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', ...],
dtype='datetime64[ns]')
- __float__(self)
- __getitem__(self, key)
- __init__(self, data=None, index=None, dtype: 'Dtype | None' = None, name=None, copy: 'bool' = False, fastpath: 'bool' = False)
- Initialize self. See help(type(self)) for accurate signature.
- __int__(self)
- __len__(self) -> 'int'
- Return the length of the Series.
- __long__ = __int__(self)
- __matmul__(self, other)
- Matrix multiplication using binary `@` operator in Python>=3.5.
- __repr__(self) -> 'str'
- Return a string representation for a particular Series.
- __rmatmul__(self, other)
- Matrix multiplication using binary `@` operator in Python>=3.5.
- __setitem__(self, key, value) -> 'None'
- add(self, other, level=None, fill_value=None, axis=0)
- Return Addition of series and other, element-wise (binary operator `add`).
Equivalent to ``series + other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.radd : Reverse of the Addition operator, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.add(b, fill_value=0)
a 2.0
b 1.0
c 1.0
d 1.0
e NaN
dtype: float64
- agg = aggregate(self, func=None, axis=0, *args, **kwargs)
- aggregate(self, func=None, axis=0, *args, **kwargs)
- Aggregate using one or more operations over the specified axis.
Parameters
----------
func : function, str, list or dict
Function to use for aggregating the data. If a function, must either
work when passed a Series or when passed to Series.apply.
Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
- dict of axis labels -> functions, function names or list of such.
axis : {0 or 'index'}
Parameter needed for compatibility with DataFrame.
*args
Positional arguments to pass to `func`.
**kwargs
Keyword arguments to pass to `func`.
Returns
-------
scalar, Series or DataFrame
The return can be:
* scalar : when Series.agg is called with single function
* Series : when DataFrame.agg is called with a single function
* DataFrame : when DataFrame.agg is called with several functions
Return scalar, Series or DataFrame.
See Also
--------
Series.apply : Invoke function on a Series.
Series.transform : Transform function producing a Series with like indexes.
Notes
-----
`agg` is an alias for `aggregate`. Use the alias.
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
A passed user-defined-function will be passed a Series for evaluation.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4])
>>> s
0 1
1 2
2 3
3 4
dtype: int64
>>> s.agg('min')
1
>>> s.agg(['min', 'max'])
min 1
max 4
dtype: int64
- align(self, other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)
- Align two objects on their axes with the specified join method.
Join method is specified for each axis Index.
Parameters
----------
other : DataFrame or Series
join : {'outer', 'inner', 'left', 'right'}, default 'outer'
axis : allowed axis of the other object, default None
Align on index (0), columns (1), or both (None).
level : int or level name, default None
Broadcast across a level, matching Index values on the
passed MultiIndex level.
copy : bool, default True
Always returns new objects. If copy=False and no reindexing is
required then original objects are returned.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any
"compatible" value.
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series:
- pad / ffill: propagate last valid observation forward to next valid.
- backfill / bfill: use NEXT valid observation to fill gap.
limit : int, default None
If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
fill_axis : {0 or 'index'}, default 0
Filling axis, method and limit.
broadcast_axis : {0 or 'index'}, default None
Broadcast values along this axis, if aligning two objects of
different dimensions.
Returns
-------
(left, right) : (Series, type of other)
Aligned objects.
Examples
--------
>>> df = pd.DataFrame(
... [[1, 2, 3, 4], [6, 7, 8, 9]], columns=["D", "B", "E", "A"], index=[1, 2]
... )
>>> other = pd.DataFrame(
... [[10, 20, 30, 40], [60, 70, 80, 90], [600, 700, 800, 900]],
... columns=["A", "B", "C", "D"],
... index=[2, 3, 4],
... )
>>> df
D B E A
1 1 2 3 4
2 6 7 8 9
>>> other
A B C D
2 10 20 30 40
3 60 70 80 90
4 600 700 800 900
Align on columns:
>>> left, right = df.align(other, join="outer", axis=1)
>>> left
A B C D E
1 4 2 NaN 1 3
2 9 7 NaN 6 8
>>> right
A B C D E
2 10 20 30 40 NaN
3 60 70 80 90 NaN
4 600 700 800 900 NaN
We can also align on the index:
>>> left, right = df.align(other, join="outer", axis=0)
>>> left
D B E A
1 1.0 2.0 3.0 4.0
2 6.0 7.0 8.0 9.0
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
>>> right
A B C D
1 NaN NaN NaN NaN
2 10.0 20.0 30.0 40.0
3 60.0 70.0 80.0 90.0
4 600.0 700.0 800.0 900.0
Finally, the default `axis=None` will align on both index and columns:
>>> left, right = df.align(other, join="outer", axis=None)
>>> left
A B C D E
1 4.0 2.0 NaN 1.0 3.0
2 9.0 7.0 NaN 6.0 8.0
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
>>> right
A B C D E
1 NaN NaN NaN NaN NaN
2 10.0 20.0 30.0 40.0 NaN
3 60.0 70.0 80.0 90.0 NaN
4 600.0 700.0 800.0 900.0 NaN
- all(self, axis=0, bool_only=None, skipna=True, level=None, **kwargs)
- Return whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a series or
along a Dataframe axis that is False or equivalent (e.g. zero or
empty).
Parameters
----------
axis : {0 or 'index', 1 or 'columns', None}, default 0
Indicate which axis or axes should be reduced.
* 0 / 'index' : reduce the index, return a Series whose index is the
original column labels.
* 1 / 'columns' : reduce the columns, return a Series whose index is the
original index.
* None : reduce all axes, return a scalar.
bool_only : bool, default None
Include only boolean columns. If None, will attempt to use everything,
then use only boolean data. Not implemented for Series.
skipna : bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is
True, then the result will be True, as for an empty row/column.
If skipna is False, then NA are treated as True, because these are not
equal to zero.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
**kwargs : any, default None
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
scalar or Series
If level is specified, then, Series is returned; otherwise, scalar
is returned.
See Also
--------
Series.all : Return True if all elements are True.
DataFrame.any : Return True if one (or more) elements are True.
Examples
--------
**Series**
>>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([], dtype="float64").all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True
**DataFrames**
Create a dataframe from a dictionary.
>>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
col1 col2
0 True True
1 True False
Default behaviour checks if column-wise values all return True.
>>> df.all()
col1 True
col2 False
dtype: bool
Specify ``axis='columns'`` to check if row-wise values all return True.
>>> df.all(axis='columns')
0 True
1 False
dtype: bool
Or ``axis=None`` for whether every value is True.
>>> df.all(axis=None)
False
- any(self, axis=0, bool_only=None, skipna=True, level=None, **kwargs)
- Return whether any element is True, potentially over an axis.
Returns False unless there is at least one element within a series or
along a Dataframe axis that is True or equivalent (e.g. non-zero or
non-empty).
Parameters
----------
axis : {0 or 'index', 1 or 'columns', None}, default 0
Indicate which axis or axes should be reduced.
* 0 / 'index' : reduce the index, return a Series whose index is the
original column labels.
* 1 / 'columns' : reduce the columns, return a Series whose index is the
original index.
* None : reduce all axes, return a scalar.
bool_only : bool, default None
Include only boolean columns. If None, will attempt to use everything,
then use only boolean data. Not implemented for Series.
skipna : bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is
True, then the result will be False, as for an empty row/column.
If skipna is False, then NA are treated as True, because these are not
equal to zero.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
**kwargs : any, default None
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
scalar or Series
If level is specified, then, Series is returned; otherwise, scalar
is returned.
See Also
--------
numpy.any : Numpy version of this method.
Series.any : Return whether any element is True.
Series.all : Return whether all elements are True.
DataFrame.any : Return whether any element is True over requested axis.
DataFrame.all : Return whether all elements are True over requested axis.
Examples
--------
**Series**
For Series input, the output is a scalar indicating whether any element
is True.
>>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([], dtype="float64").any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True
**DataFrame**
Whether each column contains at least one True element (the default).
>>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
A B C
0 1 0 0
1 2 2 0
>>> df.any()
A True
B True
C False
dtype: bool
Aggregating over the columns.
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
A B
0 True 1
1 False 2
>>> df.any(axis='columns')
0 True
1 True
dtype: bool
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
A B
0 True 1
1 False 0
>>> df.any(axis='columns')
0 True
1 False
dtype: bool
Aggregating over the entire DataFrame with ``axis=None``.
>>> df.any(axis=None)
True
`any` for an empty DataFrame is an empty Series.
>>> pd.DataFrame([]).any()
Series([], dtype: bool)
- append(self, to_append, ignore_index: 'bool' = False, verify_integrity: 'bool' = False)
- Concatenate two or more Series.
.. deprecated:: 1.4.0
Use :func:`concat` instead. For further details see
:ref:`whatsnew_140.deprecations.frame_series_append`
Parameters
----------
to_append : Series or list/tuple of Series
Series to append with self.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
verify_integrity : bool, default False
If True, raise Exception on creating index with duplicates.
Returns
-------
Series
Concatenated Series.
See Also
--------
concat : General function to concatenate DataFrame or Series objects.
Notes
-----
Iteratively appending to a Series can be more computationally intensive
than a single concatenate. A better solution is to append values to a
list and then concatenate the list with the original Series all at
once.
Examples
--------
>>> s1 = pd.Series([1, 2, 3])
>>> s2 = pd.Series([4, 5, 6])
>>> s3 = pd.Series([4, 5, 6], index=[3, 4, 5])
>>> s1.append(s2)
0 1
1 2
2 3
0 4
1 5
2 6
dtype: int64
>>> s1.append(s3)
0 1
1 2
2 3
3 4
4 5
5 6
dtype: int64
With `ignore_index` set to True:
>>> s1.append(s2, ignore_index=True)
0 1
1 2
2 3
3 4
4 5
5 6
dtype: int64
With `verify_integrity` set to True:
>>> s1.append(s2, verify_integrity=True)
Traceback (most recent call last):
...
ValueError: Indexes have overlapping values: [0, 1, 2]
- apply(self, func: 'AggFuncType', convert_dtype: 'bool' = True, args: 'tuple[Any, ...]' = (), **kwargs) -> 'DataFrame | Series'
- Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series)
or a Python function that only works on single values.
Parameters
----------
func : function
Python function or NumPy ufunc to apply.
convert_dtype : bool, default True
Try to find better dtype for elementwise function results. If
False, leave as dtype=object. Note that the dtype is always
preserved for some extension array dtypes, such as Categorical.
args : tuple
Positional arguments passed to func after the series value.
**kwargs
Additional keyword arguments passed to func.
Returns
-------
Series or DataFrame
If func returns a Series object the result will be a DataFrame.
See Also
--------
Series.map: For element-wise operations.
Series.agg: Only perform aggregating type operations.
Series.transform: Only perform transforming type operations.
Notes
-----
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
Examples
--------
Create a series with typical summer temperatures for each city.
>>> s = pd.Series([20, 21, 12],
... index=['London', 'New York', 'Helsinki'])
>>> s
London 20
New York 21
Helsinki 12
dtype: int64
Square the values by defining a function and passing it as an
argument to ``apply()``.
>>> def square(x):
... return x ** 2
>>> s.apply(square)
London 400
New York 441
Helsinki 144
dtype: int64
Square the values by passing an anonymous function as an
argument to ``apply()``.
>>> s.apply(lambda x: x ** 2)
London 400
New York 441
Helsinki 144
dtype: int64
Define a custom function that needs additional positional
arguments and pass these additional arguments using the
``args`` keyword.
>>> def subtract_custom_value(x, custom_value):
... return x - custom_value
>>> s.apply(subtract_custom_value, args=(5,))
London 15
New York 16
Helsinki 7
dtype: int64
Define a custom function that takes keyword arguments
and pass these arguments to ``apply``.
>>> def add_custom_values(x, **kwargs):
... for month in kwargs:
... x += kwargs[month]
... return x
>>> s.apply(add_custom_values, june=30, july=20, august=25)
London 95
New York 96
Helsinki 87
dtype: int64
Use a function from the Numpy library.
>>> s.apply(np.log)
London 2.995732
New York 3.044522
Helsinki 2.484907
dtype: float64
- argsort(self, axis=0, kind='quicksort', order=None) -> 'Series'
- Return the integer indices that would sort the Series values.
Override ndarray.argsort. Argsorts the value, omitting NA/null values,
and places the result in the same locations as the non-NA values.
Parameters
----------
axis : {0 or "index"}
Has no effect but is accepted for compatibility with numpy.
kind : {'mergesort', 'quicksort', 'heapsort', 'stable'}, default 'quicksort'
Choice of sorting algorithm. See :func:`numpy.sort` for more
information. 'mergesort' and 'stable' are the only stable algorithms.
order : None
Has no effect but is accepted for compatibility with numpy.
Returns
-------
Series[np.intp]
Positions of values within the sort order with -1 indicating
nan values.
See Also
--------
numpy.ndarray.argsort : Returns the indices that would sort this array.
- asfreq(self, freq, method=None, how: 'str | None' = None, normalize: 'bool' = False, fill_value=None) -> 'Series'
- Convert time series to specified frequency.
Returns the original data conformed to a new index with the specified
frequency.
If the index of this Series is a :class:`~pandas.PeriodIndex`, the new index
is the result of transforming the original index with
:meth:`PeriodIndex.asfreq <pandas.PeriodIndex.asfreq>` (so the original index
will map one-to-one to the new index).
Otherwise, the new index will be equivalent to ``pd.date_range(start, end,
freq=freq)`` where ``start`` and ``end`` are, respectively, the first and
last entries in the original index (see :func:`pandas.date_range`). The
values corresponding to any timesteps in the new index which were not present
in the original index will be null (``NaN``), unless a method for filling
such unknowns is provided (see the ``method`` parameter below).
The :meth:`resample` method is more appropriate if an operation on each group of
timesteps (such as an aggregate) is necessary to represent the data at the new
frequency.
Parameters
----------
freq : DateOffset or str
Frequency DateOffset or string.
method : {'backfill'/'bfill', 'pad'/'ffill'}, default None
Method to use for filling holes in reindexed Series (note this
does not fill NaNs that already were present):
* 'pad' / 'ffill': propagate last valid observation forward to next
valid
* 'backfill' / 'bfill': use NEXT valid observation to fill.
how : {'start', 'end'}, default end
For PeriodIndex only (see PeriodIndex.asfreq).
normalize : bool, default False
Whether to reset output index to midnight.
fill_value : scalar, optional
Value to use for missing values, applied during upsampling (note
this does not fill NaNs that already were present).
Returns
-------
Series
Series object reindexed to the specified frequency.
See Also
--------
reindex : Conform DataFrame to new index with optional filling logic.
Notes
-----
To learn more about the frequency strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
Examples
--------
Start by creating a series with 4 one minute timestamps.
>>> index = pd.date_range('1/1/2000', periods=4, freq='T')
>>> series = pd.Series([0.0, None, 2.0, 3.0], index=index)
>>> df = pd.DataFrame({'s': series})
>>> df
s
2000-01-01 00:00:00 0.0
2000-01-01 00:01:00 NaN
2000-01-01 00:02:00 2.0
2000-01-01 00:03:00 3.0
Upsample the series into 30 second bins.
>>> df.asfreq(freq='30S')
s
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 NaN
2000-01-01 00:01:00 NaN
2000-01-01 00:01:30 NaN
2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 NaN
2000-01-01 00:03:00 3.0
Upsample again, providing a ``fill value``.
>>> df.asfreq(freq='30S', fill_value=9.0)
s
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 9.0
2000-01-01 00:01:00 NaN
2000-01-01 00:01:30 9.0
2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 9.0
2000-01-01 00:03:00 3.0
Upsample again, providing a ``method``.
>>> df.asfreq(freq='30S', method='bfill')
s
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 NaN
2000-01-01 00:01:00 NaN
2000-01-01 00:01:30 2.0
2000-01-01 00:02:00 2.0
2000-01-01 00:02:30 3.0
2000-01-01 00:03:00 3.0
- autocorr(self, lag=1) -> 'float'
- Compute the lag-N autocorrelation.
This method computes the Pearson correlation between
the Series and its shifted self.
Parameters
----------
lag : int, default 1
Number of lags to apply before performing autocorrelation.
Returns
-------
float
The Pearson correlation between self and self.shift(lag).
See Also
--------
Series.corr : Compute the correlation between two Series.
Series.shift : Shift index by desired number of periods.
DataFrame.corr : Compute pairwise correlation of columns.
DataFrame.corrwith : Compute pairwise correlation between rows or
columns of two DataFrame objects.
Notes
-----
If the Pearson correlation is not well defined return 'NaN'.
Examples
--------
>>> s = pd.Series([0.25, 0.5, 0.2, -0.05])
>>> s.autocorr() # doctest: +ELLIPSIS
0.10355...
>>> s.autocorr(lag=2) # doctest: +ELLIPSIS
-0.99999...
If the Pearson correlation is not well defined, then 'NaN' is returned.
>>> s = pd.Series([1, 0, 0, 0])
>>> s.autocorr()
nan
- between(self, left, right, inclusive='both') -> 'Series'
- Return boolean Series equivalent to left <= series <= right.
This function returns a boolean vector containing `True` wherever the
corresponding Series element is between the boundary values `left` and
`right`. NA values are treated as `False`.
Parameters
----------
left : scalar or list-like
Left boundary.
right : scalar or list-like
Right boundary.
inclusive : {"both", "neither", "left", "right"}
Include boundaries. Whether to set each bound as closed or open.
.. versionchanged:: 1.3.0
Returns
-------
Series
Series representing whether each element is between left and
right (inclusive).
See Also
--------
Series.gt : Greater than of series and other.
Series.lt : Less than of series and other.
Notes
-----
This function is equivalent to ``(left <= ser) & (ser <= right)``
Examples
--------
>>> s = pd.Series([2, 0, 4, 8, np.nan])
Boundary values are included by default:
>>> s.between(1, 4)
0 True
1 False
2 True
3 False
4 False
dtype: bool
With `inclusive` set to ``"neither"`` boundary values are excluded:
>>> s.between(1, 4, inclusive="neither")
0 True
1 False
2 False
3 False
4 False
dtype: bool
`left` and `right` can be any scalar value:
>>> s = pd.Series(['Alice', 'Bob', 'Carol', 'Eve'])
>>> s.between('Anna', 'Daniel')
0 False
1 True
2 True
3 False
dtype: bool
- bfill(self: 'Series', axis: 'None | Axis' = None, inplace: 'bool' = False, limit: 'None | int' = None, downcast=None) -> 'Series | None'
- Synonym for :meth:`DataFrame.fillna` with ``method='bfill'``.
Returns
-------
Series/DataFrame or None
Object with missing values filled or None if ``inplace=True``.
- clip(self: 'Series', lower=None, upper=None, axis: 'Axis | None' = None, inplace: 'bool' = False, *args, **kwargs) -> 'Series | None'
- Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds
can be singular values or array like, and in the latter case
the clipping is performed element-wise in the specified axis.
Parameters
----------
lower : float or array-like, default None
Minimum threshold value. All values below this
threshold will be set to it. A missing
threshold (e.g `NA`) will not clip the value.
upper : float or array-like, default None
Maximum threshold value. All values above this
threshold will be set to it. A missing
threshold (e.g `NA`) will not clip the value.
axis : int or str axis name, optional
Align object with lower and upper along the given axis.
inplace : bool, default False
Whether to perform the operation in place on the data.
*args, **kwargs
Additional keywords have no effect but might be accepted
for compatibility with numpy.
Returns
-------
Series or DataFrame or None
Same type as calling object with the values outside the
clip boundaries replaced or None if ``inplace=True``.
See Also
--------
Series.clip : Trim values at input threshold in series.
DataFrame.clip : Trim values at input threshold in dataframe.
numpy.clip : Clip (limit) the values in an array.
Examples
--------
>>> data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
>>> df = pd.DataFrame(data)
>>> df
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5
Clips per column using lower and upper thresholds:
>>> df.clip(-4, 6)
col_0 col_1
0 6 -2
1 -3 -4
2 0 6
3 -1 6
4 5 -4
Clips using specific lower and upper thresholds per column element:
>>> t = pd.Series([2, -4, -1, 6, 3])
>>> t
0 2
1 -4
2 -1
3 6
4 3
dtype: int64
>>> df.clip(t, t + 4, axis=0)
col_0 col_1
0 6 2
1 -3 -4
2 0 3
3 6 8
4 5 3
Clips using specific lower threshold per column element, with missing values:
>>> t = pd.Series([2, -4, np.NaN, 6, 3])
>>> t
0 2.0
1 -4.0
2 NaN
3 6.0
4 3.0
dtype: float64
>>> df.clip(t, axis=0)
col_0 col_1
0 9 2
1 -3 -4
2 0 6
3 6 8
4 5 3
- combine(self, other, func, fill_value=None) -> 'Series'
- Combine the Series with a Series or scalar according to `func`.
Combine the Series and `other` using `func` to perform elementwise
selection for combined Series.
`fill_value` is assumed when value is missing at some index
from one of the two objects being combined.
Parameters
----------
other : Series or scalar
The value(s) to be combined with the `Series`.
func : function
Function that takes two scalars as inputs and returns an element.
fill_value : scalar, optional
The value to assume when an index is missing from
one Series or the other. The default specifies to use the
appropriate NaN value for the underlying dtype of the Series.
Returns
-------
Series
The result of combining the Series with the other object.
See Also
--------
Series.combine_first : Combine Series values, choosing the calling
Series' values first.
Examples
--------
Consider 2 Datasets ``s1`` and ``s2`` containing
highest clocked speeds of different birds.
>>> s1 = pd.Series({'falcon': 330.0, 'eagle': 160.0})
>>> s1
falcon 330.0
eagle 160.0
dtype: float64
>>> s2 = pd.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0})
>>> s2
falcon 345.0
eagle 200.0
duck 30.0
dtype: float64
Now, to combine the two datasets and view the highest speeds
of the birds across the two datasets
>>> s1.combine(s2, max)
duck NaN
eagle 200.0
falcon 345.0
dtype: float64
In the previous example, the resulting value for duck is missing,
because the maximum of a NaN and a float is a NaN.
So, in the example, we set ``fill_value=0``,
so the maximum value returned will be the value from some dataset.
>>> s1.combine(s2, max, fill_value=0)
duck 30.0
eagle 200.0
falcon 345.0
dtype: float64
- combine_first(self, other) -> 'Series'
- Update null elements with value in the same location in 'other'.
Combine two Series objects by filling null values in one Series with
non-null values from the other Series. Result index will be the union
of the two indexes.
Parameters
----------
other : Series
The value(s) to be used for filling null values.
Returns
-------
Series
The result of combining the provided Series with the other object.
See Also
--------
Series.combine : Perform element-wise operation on two Series
using a given function.
Examples
--------
>>> s1 = pd.Series([1, np.nan])
>>> s2 = pd.Series([3, 4, 5])
>>> s1.combine_first(s2)
0 1.0
1 4.0
2 5.0
dtype: float64
Null values still persist if the location of that null value
does not exist in `other`
>>> s1 = pd.Series({'falcon': np.nan, 'eagle': 160.0})
>>> s2 = pd.Series({'eagle': 200.0, 'duck': 30.0})
>>> s1.combine_first(s2)
duck 30.0
eagle 160.0
falcon NaN
dtype: float64
- compare(self, other: 'Series', align_axis: 'Axis' = 1, keep_shape: 'bool' = False, keep_equal: 'bool' = False) -> 'DataFrame | Series'
- Compare to another Series and show the differences.
.. versionadded:: 1.1.0
Parameters
----------
other : Series
Object to compare with.
align_axis : {0 or 'index', 1 or 'columns'}, default 1
Determine which axis to align the comparison on.
* 0, or 'index' : Resulting differences are stacked vertically
with rows drawn alternately from self and other.
* 1, or 'columns' : Resulting differences are aligned horizontally
with columns drawn alternately from self and other.
keep_shape : bool, default False
If true, all rows and columns are kept.
Otherwise, only the ones with different values are kept.
keep_equal : bool, default False
If true, the result keeps values that are equal.
Otherwise, equal values are shown as NaNs.
Returns
-------
Series or DataFrame
If axis is 0 or 'index' the result will be a Series.
The resulting index will be a MultiIndex with 'self' and 'other'
stacked alternately at the inner level.
If axis is 1 or 'columns' the result will be a DataFrame.
It will have two columns namely 'self' and 'other'.
See Also
--------
DataFrame.compare : Compare with another DataFrame and show differences.
Notes
-----
Matching NaNs will not appear as a difference.
Examples
--------
>>> s1 = pd.Series(["a", "b", "c", "d", "e"])
>>> s2 = pd.Series(["a", "a", "c", "b", "e"])
Align the differences on columns
>>> s1.compare(s2)
self other
1 b a
3 d b
Stack the differences on indices
>>> s1.compare(s2, align_axis=0)
1 self b
other a
3 self d
other b
dtype: object
Keep all original rows
>>> s1.compare(s2, keep_shape=True)
self other
0 NaN NaN
1 b a
2 NaN NaN
3 d b
4 NaN NaN
Keep all original rows and also all original values
>>> s1.compare(s2, keep_shape=True, keep_equal=True)
self other
0 a a
1 b a
2 c c
3 d b
4 e e
- corr(self, other, method='pearson', min_periods=None) -> 'float'
- Compute correlation with `other` Series, excluding missing values.
Parameters
----------
other : Series
Series with which to compute the correlation.
method : {'pearson', 'kendall', 'spearman'} or callable
Method used to compute correlation:
- pearson : Standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- callable: Callable with input two 1d ndarrays and returning a float.
.. warning::
Note that the returned matrix from corr will have 1 along the
diagonals and will be symmetric regardless of the callable's
behavior.
min_periods : int, optional
Minimum number of observations needed to have a valid result.
Returns
-------
float
Correlation with other.
See Also
--------
DataFrame.corr : Compute pairwise correlation between columns.
DataFrame.corrwith : Compute pairwise correlation with another
DataFrame or Series.
Examples
--------
>>> def histogram_intersection(a, b):
... v = np.minimum(a, b).sum().round(decimals=1)
... return v
>>> s1 = pd.Series([.2, .0, .6, .2])
>>> s2 = pd.Series([.3, .6, .0, .1])
>>> s1.corr(s2, method=histogram_intersection)
0.3
- count(self, level=None)
- Return number of non-NA/null observations in the Series.
Parameters
----------
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a smaller Series.
Returns
-------
int or Series (if level specified)
Number of non-null values in the Series.
See Also
--------
DataFrame.count : Count non-NA cells for each column or row.
Examples
--------
>>> s = pd.Series([0.0, 1.0, np.nan])
>>> s.count()
2
- cov(self, other: 'Series', min_periods: 'int | None' = None, ddof: 'int | None' = 1) -> 'float'
- Compute covariance with Series, excluding missing values.
Parameters
----------
other : Series
Series with which to compute the covariance.
min_periods : int, optional
Minimum number of observations needed to have a valid result.
ddof : int, default 1
Delta degrees of freedom. The divisor used in calculations
is ``N - ddof``, where ``N`` represents the number of elements.
.. versionadded:: 1.1.0
Returns
-------
float
Covariance between Series and other normalized by N-1
(unbiased estimator).
See Also
--------
DataFrame.cov : Compute pairwise covariance of columns.
Examples
--------
>>> s1 = pd.Series([0.90010907, 0.13484424, 0.62036035])
>>> s2 = pd.Series([0.12528585, 0.26962463, 0.51111198])
>>> s1.cov(s2)
-0.01685762652715874
- cummax(self, axis=None, skipna=True, *args, **kwargs)
- Return cumulative maximum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
maximum.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args, **kwargs
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
scalar or Series
Return cumulative maximum of scalar or Series.
See Also
--------
core.window.Expanding.max : Similar functionality
but ignores ``NaN`` values.
Series.max : Return the maximum over
Series axis.
Series.cummax : Return cumulative maximum over Series axis.
Series.cummin : Return cumulative minimum over Series axis.
Series.cumsum : Return cumulative sum over Series axis.
Series.cumprod : Return cumulative product over Series axis.
Examples
--------
**Series**
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 NaN
2 5.0
3 -1.0
4 0.0
dtype: float64
By default, NA values are ignored.
>>> s.cummax()
0 2.0
1 NaN
2 5.0
3 5.0
4 5.0
dtype: float64
To include NA values in the operation, use ``skipna=False``
>>> s.cummax(skipna=False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
**DataFrame**
>>> df = pd.DataFrame([[2.0, 1.0],
... [3.0, np.nan],
... [1.0, 0.0]],
... columns=list('AB'))
>>> df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
By default, iterates over rows and finds the maximum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
>>> df.cummax()
A B
0 2.0 1.0
1 3.0 NaN
2 3.0 1.0
To iterate over columns and find the maximum in each row,
use ``axis=1``
>>> df.cummax(axis=1)
A B
0 2.0 2.0
1 3.0 NaN
2 1.0 1.0
- cummin(self, axis=None, skipna=True, *args, **kwargs)
- Return cumulative minimum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
minimum.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args, **kwargs
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
scalar or Series
Return cumulative minimum of scalar or Series.
See Also
--------
core.window.Expanding.min : Similar functionality
but ignores ``NaN`` values.
Series.min : Return the minimum over
Series axis.
Series.cummax : Return cumulative maximum over Series axis.
Series.cummin : Return cumulative minimum over Series axis.
Series.cumsum : Return cumulative sum over Series axis.
Series.cumprod : Return cumulative product over Series axis.
Examples
--------
**Series**
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 NaN
2 5.0
3 -1.0
4 0.0
dtype: float64
By default, NA values are ignored.
>>> s.cummin()
0 2.0
1 NaN
2 2.0
3 -1.0
4 -1.0
dtype: float64
To include NA values in the operation, use ``skipna=False``
>>> s.cummin(skipna=False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
**DataFrame**
>>> df = pd.DataFrame([[2.0, 1.0],
... [3.0, np.nan],
... [1.0, 0.0]],
... columns=list('AB'))
>>> df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
By default, iterates over rows and finds the minimum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
>>> df.cummin()
A B
0 2.0 1.0
1 2.0 NaN
2 1.0 0.0
To iterate over columns and find the minimum in each row,
use ``axis=1``
>>> df.cummin(axis=1)
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
- cumprod(self, axis=None, skipna=True, *args, **kwargs)
- Return cumulative product over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
product.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args, **kwargs
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
scalar or Series
Return cumulative product of scalar or Series.
See Also
--------
core.window.Expanding.prod : Similar functionality
but ignores ``NaN`` values.
Series.prod : Return the product over
Series axis.
Series.cummax : Return cumulative maximum over Series axis.
Series.cummin : Return cumulative minimum over Series axis.
Series.cumsum : Return cumulative sum over Series axis.
Series.cumprod : Return cumulative product over Series axis.
Examples
--------
**Series**
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 NaN
2 5.0
3 -1.0
4 0.0
dtype: float64
By default, NA values are ignored.
>>> s.cumprod()
0 2.0
1 NaN
2 10.0
3 -10.0
4 -0.0
dtype: float64
To include NA values in the operation, use ``skipna=False``
>>> s.cumprod(skipna=False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
**DataFrame**
>>> df = pd.DataFrame([[2.0, 1.0],
... [3.0, np.nan],
... [1.0, 0.0]],
... columns=list('AB'))
>>> df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
By default, iterates over rows and finds the product
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
>>> df.cumprod()
A B
0 2.0 1.0
1 6.0 NaN
2 6.0 0.0
To iterate over columns and find the product in each row,
use ``axis=1``
>>> df.cumprod(axis=1)
A B
0 2.0 2.0
1 3.0 NaN
2 1.0 0.0
- cumsum(self, axis=None, skipna=True, *args, **kwargs)
- Return cumulative sum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative
sum.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
*args, **kwargs
Additional keywords have no effect but might be accepted for
compatibility with NumPy.
Returns
-------
scalar or Series
Return cumulative sum of scalar or Series.
See Also
--------
core.window.Expanding.sum : Similar functionality
but ignores ``NaN`` values.
Series.sum : Return the sum over
Series axis.
Series.cummax : Return cumulative maximum over Series axis.
Series.cummin : Return cumulative minimum over Series axis.
Series.cumsum : Return cumulative sum over Series axis.
Series.cumprod : Return cumulative product over Series axis.
Examples
--------
**Series**
>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 NaN
2 5.0
3 -1.0
4 0.0
dtype: float64
By default, NA values are ignored.
>>> s.cumsum()
0 2.0
1 NaN
2 7.0
3 6.0
4 6.0
dtype: float64
To include NA values in the operation, use ``skipna=False``
>>> s.cumsum(skipna=False)
0 2.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
**DataFrame**
>>> df = pd.DataFrame([[2.0, 1.0],
... [3.0, np.nan],
... [1.0, 0.0]],
... columns=list('AB'))
>>> df
A B
0 2.0 1.0
1 3.0 NaN
2 1.0 0.0
By default, iterates over rows and finds the sum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.
>>> df.cumsum()
A B
0 2.0 1.0
1 5.0 NaN
2 6.0 1.0
To iterate over columns and find the sum in each row,
use ``axis=1``
>>> df.cumsum(axis=1)
A B
0 2.0 3.0
1 3.0 NaN
2 1.0 1.0
- diff(self, periods: 'int' = 1) -> 'Series'
- First discrete difference of element.
Calculates the difference of a Series element compared with another
element in the Series (default is element in previous row).
Parameters
----------
periods : int, default 1
Periods to shift for calculating difference, accepts negative
values.
Returns
-------
Series
First differences of the Series.
See Also
--------
Series.pct_change: Percent change over given number of periods.
Series.shift: Shift index by desired number of periods with an
optional time freq.
DataFrame.diff: First discrete difference of object.
Notes
-----
For boolean dtypes, this uses :meth:`operator.xor` rather than
:meth:`operator.sub`.
The result is calculated according to current dtype in Series,
however dtype of the result is always float64.
Examples
--------
Difference with previous row
>>> s = pd.Series([1, 1, 2, 3, 5, 8])
>>> s.diff()
0 NaN
1 0.0
2 1.0
3 1.0
4 2.0
5 3.0
dtype: float64
Difference with 3rd previous row
>>> s.diff(periods=3)
0 NaN
1 NaN
2 NaN
3 2.0
4 4.0
5 6.0
dtype: float64
Difference with following row
>>> s.diff(periods=-1)
0 0.0
1 -1.0
2 -1.0
3 -2.0
4 -3.0
5 NaN
dtype: float64
Overflow in input dtype
>>> s = pd.Series([1, 0], dtype=np.uint8)
>>> s.diff()
0 NaN
1 255.0
dtype: float64
- div = truediv(self, other, level=None, fill_value=None, axis=0)
- divide = truediv(self, other, level=None, fill_value=None, axis=0)
- divmod(self, other, level=None, fill_value=None, axis=0)
- Return Integer division and modulo of series and other, element-wise (binary operator `divmod`).
Equivalent to ``divmod(series, other)``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
2-Tuple of Series
The result of the operation.
See Also
--------
Series.rdivmod : Reverse of the Integer division and modulo operator, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.divmod(b, fill_value=0)
(a 1.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64,
a 0.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64)
- dot(self, other)
- Compute the dot product between the Series and the columns of other.
This method computes the dot product between the Series and another
one, or the Series and each columns of a DataFrame, or the Series and
each columns of an array.
It can also be called using `self @ other` in Python >= 3.5.
Parameters
----------
other : Series, DataFrame or array-like
The other object to compute the dot product with its columns.
Returns
-------
scalar, Series or numpy.ndarray
Return the dot product of the Series and other if other is a
Series, the Series of the dot product of Series and each rows of
other if other is a DataFrame or a numpy.ndarray between the Series
and each columns of the numpy array.
See Also
--------
DataFrame.dot: Compute the matrix product with the DataFrame.
Series.mul: Multiplication of series and other, element-wise.
Notes
-----
The Series and other has to share the same index if other is a Series
or a DataFrame.
Examples
--------
>>> s = pd.Series([0, 1, 2, 3])
>>> other = pd.Series([-1, 2, -3, 4])
>>> s.dot(other)
8
>>> s @ other
8
>>> df = pd.DataFrame([[0, 1], [-2, 3], [4, -5], [6, 7]])
>>> s.dot(df)
0 24
1 14
dtype: int64
>>> arr = np.array([[0, 1], [-2, 3], [4, -5], [6, 7]])
>>> s.dot(arr)
array([24, 14])
- drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise') -> 'Series'
- Return Series with specified index labels removed.
Remove elements of a Series based on specifying the index labels.
When using a multi-index, labels on different levels can be removed
by specifying the level.
Parameters
----------
labels : single label or list-like
Index labels to drop.
axis : 0, default 0
Redundant for application on Series.
index : single label or list-like
Redundant for application on Series, but 'index' can be used instead
of 'labels'.
columns : single label or list-like
No change is made to the Series; use 'index' or 'labels' instead.
level : int or level name, optional
For MultiIndex, level for which the labels will be removed.
inplace : bool, default False
If True, do operation inplace and return None.
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and only existing labels are dropped.
Returns
-------
Series or None
Series with specified index labels removed or None if ``inplace=True``.
Raises
------
KeyError
If none of the labels are found in the index.
See Also
--------
Series.reindex : Return only specified index labels of Series.
Series.dropna : Return series without null values.
Series.drop_duplicates : Return Series with duplicate values removed.
DataFrame.drop : Drop specified labels from rows or columns.
Examples
--------
>>> s = pd.Series(data=np.arange(3), index=['A', 'B', 'C'])
>>> s
A 0
B 1
C 2
dtype: int64
Drop labels B en C
>>> s.drop(labels=['B', 'C'])
A 0
dtype: int64
Drop 2nd level label in MultiIndex Series
>>> midx = pd.MultiIndex(levels=[['lama', 'cow', 'falcon'],
... ['speed', 'weight', 'length']],
... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
... [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> s = pd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
... index=midx)
>>> s
lama speed 45.0
weight 200.0
length 1.2
cow speed 30.0
weight 250.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
dtype: float64
>>> s.drop(labels='weight', level=1)
lama speed 45.0
length 1.2
cow speed 30.0
length 1.5
falcon speed 320.0
length 0.3
dtype: float64
- drop_duplicates(self, keep='first', inplace=False) -> 'Series | None'
- Return Series with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
Method to handle dropping duplicates:
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
inplace : bool, default ``False``
If ``True``, performs operation inplace and returns None.
Returns
-------
Series or None
Series with duplicates dropped or None if ``inplace=True``.
See Also
--------
Index.drop_duplicates : Equivalent method on Index.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Series.duplicated : Related method on Series, indicating duplicate
Series values.
Examples
--------
Generate a Series with duplicated entries.
>>> s = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'],
... name='animal')
>>> s
0 lama
1 cow
2 lama
3 beetle
4 lama
5 hippo
Name: animal, dtype: object
With the 'keep' parameter, the selection behaviour of duplicated values
can be changed. The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> s.drop_duplicates()
0 lama
1 cow
3 beetle
5 hippo
Name: animal, dtype: object
The value 'last' for parameter 'keep' keeps the last occurrence for
each set of duplicated entries.
>>> s.drop_duplicates(keep='last')
1 cow
3 beetle
4 lama
5 hippo
Name: animal, dtype: object
The value ``False`` for parameter 'keep' discards all sets of
duplicated entries. Setting the value of 'inplace' to ``True`` performs
the operation inplace and returns ``None``.
>>> s.drop_duplicates(keep=False, inplace=True)
>>> s
1 cow
3 beetle
5 hippo
Name: animal, dtype: object
- dropna(self, axis=0, inplace=False, how=None)
- Return a new Series with missing values removed.
See the :ref:`User Guide <missing_data>` for more on which values are
considered missing, and how to work with missing data.
Parameters
----------
axis : {0 or 'index'}, default 0
There is only one axis to drop values from.
inplace : bool, default False
If True, do operation inplace and return None.
how : str, optional
Not in use. Kept for compatibility.
Returns
-------
Series or None
Series with NA entries dropped from it or None if ``inplace=True``.
See Also
--------
Series.isna: Indicate missing values.
Series.notna : Indicate existing (non-missing) values.
Series.fillna : Replace missing values.
DataFrame.dropna : Drop rows or columns which contain NA values.
Index.dropna : Drop missing indices.
Examples
--------
>>> ser = pd.Series([1., 2., np.nan])
>>> ser
0 1.0
1 2.0
2 NaN
dtype: float64
Drop NA values from a Series.
>>> ser.dropna()
0 1.0
1 2.0
dtype: float64
Keep the Series with valid entries in the same variable.
>>> ser.dropna(inplace=True)
>>> ser
0 1.0
1 2.0
dtype: float64
Empty strings are not considered NA values. ``None`` is considered an
NA value.
>>> ser = pd.Series([np.NaN, 2, pd.NaT, '', None, 'I stay'])
>>> ser
0 NaN
1 2
2 NaT
3
4 None
5 I stay
dtype: object
>>> ser.dropna()
1 2
3
5 I stay
dtype: object
- duplicated(self, keep='first') -> 'Series'
- Indicate duplicate Series values.
Duplicated values are indicated as ``True`` values in the resulting
Series. Either all duplicates, all except the first or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
Method to handle dropping duplicates:
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
Series[bool]
Series indicating whether each value has occurred in the
preceding values.
See Also
--------
Index.duplicated : Equivalent method on pandas.Index.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Series.drop_duplicates : Remove duplicate values from Series.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set on False and all others on True:
>>> animals = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> animals.duplicated()
0 False
1 False
2 True
3 False
4 True
dtype: bool
which is equivalent to
>>> animals.duplicated(keep='first')
0 False
1 False
2 True
3 False
4 True
dtype: bool
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> animals.duplicated(keep='last')
0 True
1 False
2 True
3 False
4 False
dtype: bool
By setting keep on ``False``, all duplicates are True:
>>> animals.duplicated(keep=False)
0 True
1 False
2 True
3 False
4 True
dtype: bool
- eq(self, other, level=None, fill_value=None, axis=0)
- Return Equal to of series and other, element-wise (binary operator `eq`).
Equivalent to ``series == other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.eq(b, fill_value=0)
a True
b False
c False
d False
e False
dtype: bool
- explode(self, ignore_index: 'bool' = False) -> 'Series'
- Transform each element of a list-like to a row.
.. versionadded:: 0.25.0
Parameters
----------
ignore_index : bool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.1.0
Returns
-------
Series
Exploded lists to rows; index will be duplicated for these rows.
See Also
--------
Series.str.split : Split string values on specified separator.
Series.unstack : Unstack, a.k.a. pivot, Series with MultiIndex
to produce DataFrame.
DataFrame.melt : Unpivot a DataFrame from wide format to long format.
DataFrame.explode : Explode a DataFrame from list-like
columns to long format.
Notes
-----
This routine will explode list-likes including lists, tuples, sets,
Series, and np.ndarray. The result dtype of the subset rows will
be object. Scalars will be returned unchanged, and empty list-likes will
result in a np.nan for that row. In addition, the ordering of elements in
the output will be non-deterministic when exploding sets.
Reference :ref:`the user guide <reshaping.explode>` for more examples.
Examples
--------
>>> s = pd.Series([[1, 2, 3], 'foo', [], [3, 4]])
>>> s
0 [1, 2, 3]
1 foo
2 []
3 [3, 4]
dtype: object
>>> s.explode()
0 1
0 2
0 3
1 foo
2 NaN
3 3
3 4
dtype: object
- ffill(self: 'Series', axis: 'None | Axis' = None, inplace: 'bool' = False, limit: 'None | int' = None, downcast=None) -> 'Series | None'
- Synonym for :meth:`DataFrame.fillna` with ``method='ffill'``.
Returns
-------
Series/DataFrame or None
Object with missing values filled or None if ``inplace=True``.
- fillna(self, value: 'object | ArrayLike | None' = None, method: 'FillnaOptions | None' = None, axis=None, inplace=False, limit=None, downcast=None) -> 'Series | None'
- Fill NA/NaN values using the specified method.
Parameters
----------
value : scalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a
dict/Series/DataFrame of values specifying which value to use for
each index (for a Series) or column (for a DataFrame). Values not
in the dict/Series/DataFrame will not be filled. This value cannot
be a list.
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use next valid observation to fill gap.
axis : {0 or 'index'}
Axis along which to fill missing values.
inplace : bool, default False
If True, fill in-place. Note: this will modify any
other views on this object (e.g., a no-copy slice for a column in a
DataFrame).
limit : int, default None
If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
Series or None
Object with missing values filled or None if ``inplace=True``.
See Also
--------
interpolate : Fill NaN values using interpolation.
reindex : Conform object to new index.
asfreq : Convert TimeSeries to specified frequency.
Examples
--------
>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
... [3, 4, np.nan, 1],
... [np.nan, np.nan, np.nan, np.nan],
... [np.nan, 3, np.nan, 4]],
... columns=list("ABCD"))
>>> df
A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 NaN NaN NaN NaN
3 NaN 3.0 NaN 4.0
Replace all NaN elements with 0s.
>>> df.fillna(0)
A B C D
0 0.0 2.0 0.0 0.0
1 3.0 4.0 0.0 1.0
2 0.0 0.0 0.0 0.0
3 0.0 3.0 0.0 4.0
We can also propagate non-null values forward or backward.
>>> df.fillna(method="ffill")
A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 3.0 4.0 NaN 1.0
3 3.0 3.0 NaN 4.0
Replace all NaN elements in column 'A', 'B', 'C', and 'D', with 0, 1,
2, and 3 respectively.
>>> values = {"A": 0, "B": 1, "C": 2, "D": 3}
>>> df.fillna(value=values)
A B C D
0 0.0 2.0 2.0 0.0
1 3.0 4.0 2.0 1.0
2 0.0 1.0 2.0 3.0
3 0.0 3.0 2.0 4.0
Only replace the first NaN element.
>>> df.fillna(value=values, limit=1)
A B C D
0 0.0 2.0 2.0 0.0
1 3.0 4.0 NaN 1.0
2 NaN 1.0 NaN 3.0
3 NaN 3.0 NaN 4.0
When filling using a DataFrame, replacement happens along
the same column names and same indices
>>> df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE"))
>>> df.fillna(df2)
A B C D
0 0.0 2.0 0.0 0.0
1 3.0 4.0 0.0 1.0
2 0.0 0.0 0.0 NaN
3 0.0 3.0 0.0 4.0
Note that column D is not affected since it is not present in df2.
- floordiv(self, other, level=None, fill_value=None, axis=0)
- Return Integer division of series and other, element-wise (binary operator `floordiv`).
Equivalent to ``series // other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.rfloordiv : Reverse of the Integer division operator, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.floordiv(b, fill_value=0)
a 1.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64
- ge(self, other, level=None, fill_value=None, axis=0)
- Return Greater than or equal to of series and other, element-wise (binary operator `ge`).
Equivalent to ``series >= other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
e 1.0
dtype: float64
>>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
>>> b
a 0.0
b 1.0
c 2.0
d NaN
f 1.0
dtype: float64
>>> a.ge(b, fill_value=0)
a True
b True
c False
d False
e True
f False
dtype: bool
- groupby(self, by=None, axis=0, level=None, as_index: 'bool' = True, sort: 'bool' = True, group_keys: 'bool' = True, squeeze: 'bool | lib.NoDefault' = <no_default>, observed: 'bool' = False, dropna: 'bool' = True) -> 'SeriesGroupBy'
- Group Series using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.
Parameters
----------
by : mapping, function, label, or list of labels
Used to determine the groups for the groupby.
If ``by`` is a function, it's called on each value of the object's
index. If a dict or Series is passed, the Series or dict VALUES
will be used to determine the groups (the Series' values are first
aligned; see ``.align()`` method). If a list or ndarray of length
equal to the selected axis is passed (see the `groupby user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups>`_),
the values are used as-is to determine the groups. A label or list
of labels may be passed to group by the columns in ``self``.
Notice that a tuple is interpreted as a (single) key.
axis : {0 or 'index', 1 or 'columns'}, default 0
Split along rows (0) or columns (1).
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular
level or levels.
as_index : bool, default True
For aggregated output, return object with group labels as the
index. Only relevant for DataFrame input. as_index=False is
effectively "SQL-style" grouped output.
sort : bool, default True
Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within each
group. Groupby preserves the order of rows within each group.
group_keys : bool, default True
When calling apply, add group keys to index to identify pieces.
squeeze : bool, default False
Reduce the dimensionality of the return type if possible,
otherwise return a consistent type.
.. deprecated:: 1.1.0
observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.
dropna : bool, default True
If True, and if group keys contain NA values, NA values together
with row/column will be dropped.
If False, NA values will also be treated as the key in groups.
.. versionadded:: 1.1.0
Returns
-------
SeriesGroupBy
Returns a groupby object that contains information about the groups.
See Also
--------
resample : Convenience method for frequency conversion and resampling
of time series.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/groupby.html>`__ for more
detailed usage and examples, including splitting an object into groups,
iterating through groups, selecting a group, aggregation, and more.
Examples
--------
>>> ser = pd.Series([390., 350., 30., 20.],
... index=['Falcon', 'Falcon', 'Parrot', 'Parrot'], name="Max Speed")
>>> ser
Falcon 390.0
Falcon 350.0
Parrot 30.0
Parrot 20.0
Name: Max Speed, dtype: float64
>>> ser.groupby(["a", "b", "a", "b"]).mean()
a 210.0
b 185.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level=0).mean()
Falcon 370.0
Parrot 25.0
Name: Max Speed, dtype: float64
>>> ser.groupby(ser > 100).mean()
Max Speed
False 25.0
True 370.0
Name: Max Speed, dtype: float64
**Grouping by Indexes**
We can groupby different levels of a hierarchical index
using the `level` parameter:
>>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
... ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> ser = pd.Series([390., 350., 30., 20.], index=index, name="Max Speed")
>>> ser
Animal Type
Falcon Captive 390.0
Wild 350.0
Parrot Captive 30.0
Wild 20.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level=0).mean()
Animal
Falcon 370.0
Parrot 25.0
Name: Max Speed, dtype: float64
>>> ser.groupby(level="Type").mean()
Type
Captive 210.0
Wild 185.0
Name: Max Speed, dtype: float64
We can also choose to include `NA` in group keys or not by defining
`dropna` parameter, the default setting is `True`.
>>> ser = pd.Series([1, 2, 3, 3], index=["a", 'a', 'b', np.nan])
>>> ser.groupby(level=0).sum()
a 3
b 3
dtype: int64
>>> ser.groupby(level=0, dropna=False).sum()
a 3
b 3
NaN 3
dtype: int64
>>> arrays = ['Falcon', 'Falcon', 'Parrot', 'Parrot']
>>> ser = pd.Series([390., 350., 30., 20.], index=arrays, name="Max Speed")
>>> ser.groupby(["a", "b", "a", np.nan]).mean()
a 210.0
b 350.0
Name: Max Speed, dtype: float64
>>> ser.groupby(["a", "b", "a", np.nan], dropna=False).mean()
a 210.0
b 350.0
NaN 20.0
Name: Max Speed, dtype: float64
- gt(self, other, level=None, fill_value=None, axis=0)
- Return Greater than of series and other, element-wise (binary operator `gt`).
Equivalent to ``series > other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
e 1.0
dtype: float64
>>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
>>> b
a 0.0
b 1.0
c 2.0
d NaN
f 1.0
dtype: float64
>>> a.gt(b, fill_value=0)
a True
b False
c False
d False
e True
f False
dtype: bool
- hist = hist_series(self, by=None, ax=None, grid: 'bool' = True, xlabelsize: 'int | None' = None, xrot: 'float | None' = None, ylabelsize: 'int | None' = None, yrot: 'float | None' = None, figsize: 'tuple[int, int] | None' = None, bins: 'int | Sequence[int]' = 10, backend: 'str | None' = None, legend: 'bool' = False, **kwargs)
- Draw histogram of the input series using matplotlib.
Parameters
----------
by : object, optional
If passed, then used to form histograms for separate groups.
ax : matplotlib axis object
If not passed, uses gca().
grid : bool, default True
Whether to show axis grid lines.
xlabelsize : int, default None
If specified changes the x-axis label size.
xrot : float, default None
Rotation of x axis labels.
ylabelsize : int, default None
If specified changes the y-axis label size.
yrot : float, default None
Rotation of y axis labels.
figsize : tuple, default None
Figure size in inches by default.
bins : int or sequence, default 10
Number of histogram bins to be used. If an integer is given, bins + 1
bin edges are calculated and returned. If bins is a sequence, gives
bin edges, including left edge of first bin and right edge of last
bin. In this case, bins is returned unmodified.
backend : str, default None
Backend to use instead of the backend specified in the option
``plotting.backend``. For instance, 'matplotlib'. Alternatively, to
specify the ``plotting.backend`` for the whole session, set
``pd.options.plotting.backend``.
.. versionadded:: 1.0.0
legend : bool, default False
Whether to show the legend.
.. versionadded:: 1.1.0
**kwargs
To be passed to the actual plotting function.
Returns
-------
matplotlib.AxesSubplot
A histogram plot.
See Also
--------
matplotlib.axes.Axes.hist : Plot a histogram using matplotlib.
- idxmax(self, axis=0, skipna=True, *args, **kwargs)
- Return the row label of the maximum value.
If multiple values equal the maximum, the first row label with that
value is returned.
Parameters
----------
axis : int, default 0
For compatibility with DataFrame.idxmax. Redundant for application
on Series.
skipna : bool, default True
Exclude NA/null values. If the entire Series is NA, the result
will be NA.
*args, **kwargs
Additional arguments and keywords have no effect but might be
accepted for compatibility with NumPy.
Returns
-------
Index
Label of the maximum value.
Raises
------
ValueError
If the Series is empty.
See Also
--------
numpy.argmax : Return indices of the maximum values
along the given axis.
DataFrame.idxmax : Return index of first occurrence of maximum
over requested axis.
Series.idxmin : Return index *label* of the first occurrence
of minimum of values.
Notes
-----
This method is the Series version of ``ndarray.argmax``. This method
returns the label of the maximum, while ``ndarray.argmax`` returns
the position. To get the position, use ``series.values.argmax()``.
Examples
--------
>>> s = pd.Series(data=[1, None, 4, 3, 4],
... index=['A', 'B', 'C', 'D', 'E'])
>>> s
A 1.0
B NaN
C 4.0
D 3.0
E 4.0
dtype: float64
>>> s.idxmax()
'C'
If `skipna` is False and there is an NA value in the data,
the function returns ``nan``.
>>> s.idxmax(skipna=False)
nan
- idxmin(self, axis=0, skipna=True, *args, **kwargs)
- Return the row label of the minimum value.
If multiple values equal the minimum, the first row label with that
value is returned.
Parameters
----------
axis : int, default 0
For compatibility with DataFrame.idxmin. Redundant for application
on Series.
skipna : bool, default True
Exclude NA/null values. If the entire Series is NA, the result
will be NA.
*args, **kwargs
Additional arguments and keywords have no effect but might be
accepted for compatibility with NumPy.
Returns
-------
Index
Label of the minimum value.
Raises
------
ValueError
If the Series is empty.
See Also
--------
numpy.argmin : Return indices of the minimum values
along the given axis.
DataFrame.idxmin : Return index of first occurrence of minimum
over requested axis.
Series.idxmax : Return index *label* of the first occurrence
of maximum of values.
Notes
-----
This method is the Series version of ``ndarray.argmin``. This method
returns the label of the minimum, while ``ndarray.argmin`` returns
the position. To get the position, use ``series.values.argmin()``.
Examples
--------
>>> s = pd.Series(data=[1, None, 4, 1],
... index=['A', 'B', 'C', 'D'])
>>> s
A 1.0
B NaN
C 4.0
D 1.0
dtype: float64
>>> s.idxmin()
'A'
If `skipna` is False and there is an NA value in the data,
the function returns ``nan``.
>>> s.idxmin(skipna=False)
nan
- info(self, verbose: 'bool | None' = None, buf: 'IO[str] | None' = None, max_cols: 'int | None' = None, memory_usage: 'bool | str | None' = None, show_counts: 'bool' = True) -> 'None'
- Print a concise summary of a Series.
This method prints information about a Series including
the index dtype, non-null values and memory usage.
.. versionadded:: 1.4.0
Parameters
----------
data : Series
Series to print information about.
verbose : bool, optional
Whether to print the full summary. By default, the setting in
``pandas.options.display.max_info_columns`` is followed.
buf : writable buffer, defaults to sys.stdout
Where to send the output. By default, the output is printed to
sys.stdout. Pass a writable buffer if you need to further process
the output.
memory_usage : bool, str, optional
Specifies whether total memory usage of the Series
elements (including the index) should be displayed. By default,
this follows the ``pandas.options.display.memory_usage`` setting.
True always show memory usage. False never shows memory usage.
A value of 'deep' is equivalent to "True with deep introspection".
Memory usage is shown in human-readable units (base-2
representation). Without deep introspection a memory estimation is
made based in column dtype and number of rows assuming values
consume the same memory amount for corresponding dtypes. With deep
memory introspection, a real memory usage calculation is performed
at the cost of computational resources.
show_counts : bool, optional
Whether to show the non-null counts. By default, this is shown
only if the DataFrame is smaller than
``pandas.options.display.max_info_rows`` and
``pandas.options.display.max_info_columns``. A value of True always
shows the counts, and False never shows the counts.
Returns
-------
None
This method prints a summary of a Series and returns None.
See Also
--------
Series.describe: Generate descriptive statistics of Series.
Series.memory_usage: Memory usage of Series.
Examples
--------
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> s = pd.Series(text_values, index=int_values)
>>> s.info()
<class 'pandas.core.series.Series'>
Int64Index: 5 entries, 1 to 5
Series name: None
Non-Null Count Dtype
-------------- -----
5 non-null object
dtypes: object(1)
memory usage: 80.0+ bytes
Prints a summary excluding information about its values:
>>> s.info(verbose=False)
<class 'pandas.core.series.Series'>
Int64Index: 5 entries, 1 to 5
dtypes: object(1)
memory usage: 80.0+ bytes
Pipe output of Series.info to buffer instead of sys.stdout, get
buffer content and writes to a text file:
>>> import io
>>> buffer = io.StringIO()
>>> s.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
... encoding="utf-8") as f: # doctest: +SKIP
... f.write(s)
260
The `memory_usage` parameter allows deep introspection mode, specially
useful for big Series and fine-tune memory optimization:
>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> s = pd.Series(np.random.choice(['a', 'b', 'c'], 10 ** 6))
>>> s.info()
<class 'pandas.core.series.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count Dtype
-------------- -----
1000000 non-null object
dtypes: object(1)
memory usage: 7.6+ MB
>>> s.info(memory_usage='deep')
<class 'pandas.core.series.Series'>
RangeIndex: 1000000 entries, 0 to 999999
Series name: None
Non-Null Count Dtype
-------------- -----
1000000 non-null object
dtypes: object(1)
memory usage: 55.3 MB
- interpolate(self: 'Series', method: 'str' = 'linear', axis: 'Axis' = 0, limit: 'int | None' = None, inplace: 'bool' = False, limit_direction: 'str | None' = None, limit_area: 'str | None' = None, downcast: 'str | None' = None, **kwargs) -> 'Series | None'
- Fill NaN values using an interpolation method.
Please note that only ``method='linear'`` is supported for
DataFrame/Series with a MultiIndex.
Parameters
----------
method : str, default 'linear'
Interpolation technique to use. One of:
* 'linear': Ignore the index and treat the values as equally
spaced. This is the only method supported on MultiIndexes.
* 'time': Works on daily and higher resolution data to interpolate
given length of interval.
* 'index', 'values': use the actual numerical values of the index.
* 'pad': Fill in NaNs using existing values.
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
'barycentric', 'polynomial': Passed to
`scipy.interpolate.interp1d`. These methods use the numerical
values of the index. Both 'polynomial' and 'spline' require that
you also specify an `order` (int), e.g.
``df.interpolate(method='polynomial', order=5)``.
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima',
'cubicspline': Wrappers around the SciPy interpolation methods of
similar names. See `Notes`.
* 'from_derivatives': Refers to
`scipy.interpolate.BPoly.from_derivatives` which
replaces 'piecewise_polynomial' interpolation method in
scipy 0.18.
axis : {{0 or 'index', 1 or 'columns', None}}, default None
Axis to interpolate along.
limit : int, optional
Maximum number of consecutive NaNs to fill. Must be greater than
0.
inplace : bool, default False
Update the data in place if possible.
limit_direction : {{'forward', 'backward', 'both'}}, Optional
Consecutive NaNs will be filled in this direction.
If limit is specified:
* If 'method' is 'pad' or 'ffill', 'limit_direction' must be 'forward'.
* If 'method' is 'backfill' or 'bfill', 'limit_direction' must be
'backwards'.
If 'limit' is not specified:
* If 'method' is 'backfill' or 'bfill', the default is 'backward'
* else the default is 'forward'
.. versionchanged:: 1.1.0
raises ValueError if `limit_direction` is 'forward' or 'both' and
method is 'backfill' or 'bfill'.
raises ValueError if `limit_direction` is 'backward' or 'both' and
method is 'pad' or 'ffill'.
limit_area : {{`None`, 'inside', 'outside'}}, default None
If limit is specified, consecutive NaNs will be filled with this
restriction.
* ``None``: No fill restriction.
* 'inside': Only fill NaNs surrounded by valid values
(interpolate).
* 'outside': Only fill NaNs outside valid values (extrapolate).
downcast : optional, 'infer' or None, defaults to None
Downcast dtypes if possible.
``**kwargs`` : optional
Keyword arguments to pass on to the interpolating function.
Returns
-------
Series or DataFrame or None
Returns the same object type as the caller, interpolated at
some or all ``NaN`` values or None if ``inplace=True``.
See Also
--------
fillna : Fill missing values using different methods.
scipy.interpolate.Akima1DInterpolator : Piecewise cubic polynomials
(Akima interpolator).
scipy.interpolate.BPoly.from_derivatives : Piecewise polynomial in the
Bernstein basis.
scipy.interpolate.interp1d : Interpolate a 1-D function.
scipy.interpolate.KroghInterpolator : Interpolate polynomial (Krogh
interpolator).
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
interpolation.
scipy.interpolate.CubicSpline : Cubic spline data interpolator.
Notes
-----
The 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
methods are wrappers around the respective SciPy implementations of
similar names. These use the actual numerical values of the index.
For more information on their behavior, see the
`SciPy documentation
<https://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `SciPy tutorial
<https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.
Examples
--------
Filling in ``NaN`` in a :class:`~pandas.Series` via linear
interpolation.
>>> s = pd.Series([0, 1, np.nan, 3])
>>> s
0 0.0
1 1.0
2 NaN
3 3.0
dtype: float64
>>> s.interpolate()
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64
Filling in ``NaN`` in a Series by padding, but filling at most two
consecutive ``NaN`` at a time.
>>> s = pd.Series([np.nan, "single_one", np.nan,
... "fill_two_more", np.nan, np.nan, np.nan,
... 4.71, np.nan])
>>> s
0 NaN
1 single_one
2 NaN
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0 NaN
1 single_one
2 single_one
3 fill_two_more
4 fill_two_more
5 fill_two_more
6 NaN
7 4.71
8 4.71
dtype: object
Filling in ``NaN`` in a Series via polynomial interpolation or splines:
Both 'polynomial' and 'spline' methods require that you also specify
an ``order`` (int).
>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0 0.000000
1 2.000000
2 4.666667
3 8.000000
dtype: float64
Fill the DataFrame forward (that is, going down) along each column
using linear interpolation.
Note how the last entry in column 'a' is interpolated differently,
because there is no entry after it to use for interpolation.
Note how the first entry in column 'b' remains ``NaN``, because there
is no entry before it to use for interpolation.
>>> df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
... (np.nan, 2.0, np.nan, np.nan),
... (2.0, 3.0, np.nan, 9.0),
... (np.nan, 4.0, -4.0, 16.0)],
... columns=list('abcd'))
>>> df
a b c d
0 0.0 NaN -1.0 1.0
1 NaN 2.0 NaN NaN
2 2.0 3.0 NaN 9.0
3 NaN 4.0 -4.0 16.0
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
a b c d
0 0.0 NaN -1.0 1.0
1 1.0 2.0 -2.0 5.0
2 2.0 3.0 -3.0 9.0
3 2.0 4.0 -4.0 16.0
Using polynomial interpolation.
>>> df['d'].interpolate(method='polynomial', order=2)
0 1.0
1 4.0
2 9.0
3 16.0
Name: d, dtype: float64
- isin(self, values) -> 'Series'
- Whether elements in Series are contained in `values`.
Return a boolean Series showing whether each element in the Series
matches an element in the passed sequence of `values` exactly.
Parameters
----------
values : set or list-like
The sequence of values to test. Passing in a single string will
raise a ``TypeError``. Instead, turn a single string into a
list of one element.
Returns
-------
Series
Series of booleans indicating if each element is in values.
Raises
------
TypeError
* If `values` is a string
See Also
--------
DataFrame.isin : Equivalent method on DataFrame.
Examples
--------
>>> s = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama',
... 'hippo'], name='animal')
>>> s.isin(['cow', 'lama'])
0 True
1 True
2 True
3 False
4 True
5 False
Name: animal, dtype: bool
To invert the boolean values, use the ``~`` operator:
>>> ~s.isin(['cow', 'lama'])
0 False
1 False
2 False
3 True
4 False
5 True
Name: animal, dtype: bool
Passing a single string as ``s.isin('lama')`` will raise an error. Use
a list of one element instead:
>>> s.isin(['lama'])
0 True
1 False
2 True
3 False
4 True
5 False
Name: animal, dtype: bool
Strings and integers are distinct and are therefore not comparable:
>>> pd.Series([1]).isin(['1'])
0 False
dtype: bool
>>> pd.Series([1.1]).isin(['1.1'])
0 False
dtype: bool
- isna(self) -> 'Series'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or :attr:`numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
Series
Mask of bool values for each element in Series that
indicates whether an element is an NA value.
See Also
--------
Series.isnull : Alias of isna.
Series.notna : Boolean inverse of isna.
Series.dropna : Omit axes labels with missing values.
isna : Top-level isna.
Examples
--------
Show which entries in a DataFrame are NA.
>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
... born=[pd.NaT, pd.Timestamp('1939-05-27'),
... pd.Timestamp('1940-04-25')],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker']))
>>> df
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
>>> df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
Show which entries in a Series are NA.
>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0 5.0
1 6.0
2 NaN
dtype: float64
>>> ser.isna()
0 False
1 False
2 True
dtype: bool
- isnull(self) -> 'Series'
- Series.isnull is an alias for Series.isna.
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or :attr:`numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
Series
Mask of bool values for each element in Series that
indicates whether an element is an NA value.
See Also
--------
Series.isnull : Alias of isna.
Series.notna : Boolean inverse of isna.
Series.dropna : Omit axes labels with missing values.
isna : Top-level isna.
Examples
--------
Show which entries in a DataFrame are NA.
>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
... born=[pd.NaT, pd.Timestamp('1939-05-27'),
... pd.Timestamp('1940-04-25')],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker']))
>>> df
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
>>> df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
Show which entries in a Series are NA.
>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0 5.0
1 6.0
2 NaN
dtype: float64
>>> ser.isna()
0 False
1 False
2 True
dtype: bool
- items(self) -> 'Iterable[tuple[Hashable, Any]]'
- Lazily iterate over (index, value) tuples.
This method returns an iterable tuple (index, value). This is
convenient if you want to create a lazy iterator.
Returns
-------
iterable
Iterable of tuples containing the (index, value) pairs from a
Series.
See Also
--------
DataFrame.items : Iterate over (column name, Series) pairs.
DataFrame.iterrows : Iterate over DataFrame rows as (index, Series) pairs.
Examples
--------
>>> s = pd.Series(['A', 'B', 'C'])
>>> for index, value in s.items():
... print(f"Index : {index}, Value : {value}")
Index : 0, Value : A
Index : 1, Value : B
Index : 2, Value : C
- iteritems(self) -> 'Iterable[tuple[Hashable, Any]]'
- Lazily iterate over (index, value) tuples.
This method returns an iterable tuple (index, value). This is
convenient if you want to create a lazy iterator.
Returns
-------
iterable
Iterable of tuples containing the (index, value) pairs from a
Series.
See Also
--------
DataFrame.items : Iterate over (column name, Series) pairs.
DataFrame.iterrows : Iterate over DataFrame rows as (index, Series) pairs.
Examples
--------
>>> s = pd.Series(['A', 'B', 'C'])
>>> for index, value in s.items():
... print(f"Index : {index}, Value : {value}")
Index : 0, Value : A
Index : 1, Value : B
Index : 2, Value : C
- keys(self) -> 'Index'
- Return alias for index.
Returns
-------
Index
Index of the Series.
- kurt(self, axis: 'Axis | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher's definition of
kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
scalar or Series (if level specified)
- kurtosis = kurt(self, axis: 'Axis | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- le(self, other, level=None, fill_value=None, axis=0)
- Return Less than or equal to of series and other, element-wise (binary operator `le`).
Equivalent to ``series <= other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
e 1.0
dtype: float64
>>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
>>> b
a 0.0
b 1.0
c 2.0
d NaN
f 1.0
dtype: float64
>>> a.le(b, fill_value=0)
a False
b True
c True
d False
e False
f True
dtype: bool
- lt(self, other, level=None, fill_value=None, axis=0)
- Return Less than of series and other, element-wise (binary operator `lt`).
Equivalent to ``series < other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
e 1.0
dtype: float64
>>> b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
>>> b
a 0.0
b 1.0
c 2.0
d NaN
f 1.0
dtype: float64
>>> a.lt(b, fill_value=0)
a False
b False
c True
d False
e False
f True
dtype: bool
- mad(self, axis=None, skipna=True, level=None)
- Return the mean absolute deviation of the values over the requested axis.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
Returns
-------
scalar or Series (if level specified)
- map(self, arg, na_action=None) -> 'Series'
- Map values of Series according to an input mapping or function.
Used for substituting each value in a Series with another value,
that may be derived from a function, a ``dict`` or
a :class:`Series`.
Parameters
----------
arg : function, collections.abc.Mapping subclass or Series
Mapping correspondence.
na_action : {None, 'ignore'}, default None
If 'ignore', propagate NaN values, without passing them to the
mapping correspondence.
Returns
-------
Series
Same index as caller.
See Also
--------
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
Notes
-----
When ``arg`` is a dictionary, values in Series that are not in the
dictionary (as keys) are converted to ``NaN``. However, if the
dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.
provides a method for default values), then this default is used
rather than ``NaN``.
Examples
--------
>>> s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
>>> s
0 cat
1 dog
2 NaN
3 rabbit
dtype: object
``map`` accepts a ``dict`` or a ``Series``. Values that are not found
in the ``dict`` are converted to ``NaN``, unless the dict has a default
value (e.g. ``defaultdict``):
>>> s.map({'cat': 'kitten', 'dog': 'puppy'})
0 kitten
1 puppy
2 NaN
3 NaN
dtype: object
It also accepts a function:
>>> s.map('I am a {}'.format)
0 I am a cat
1 I am a dog
2 I am a nan
3 I am a rabbit
dtype: object
To avoid applying the function to missing values (and keep them as
``NaN``) ``na_action='ignore'`` can be used:
>>> s.map('I am a {}'.format, na_action='ignore')
0 I am a cat
1 I am a dog
2 NaN
3 I am a rabbit
dtype: object
- mask(self, cond, other=nan, inplace=False, axis=None, level=None, errors=<no_default>, try_cast=<no_default>)
- Replace values where the condition is True.
Parameters
----------
cond : bool Series/DataFrame, array-like, or callable
Where `cond` is False, keep the original value. Where
True, replace with corresponding value from `other`.
If `cond` is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn't check it).
other : scalar, Series/DataFrame, or callable
Entries where `cond` is True are replaced with
corresponding value from `other`.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn't check it).
inplace : bool, default False
Whether to perform the operation in place on the data.
axis : int, default None
Alignment axis if needed.
level : int, default None
Alignment level if needed.
errors : str, {'raise', 'ignore'}, default 'raise'
Note that currently this parameter won't affect
the results and will always coerce to a suitable dtype.
- 'raise' : allow exceptions to be raised.
- 'ignore' : suppress exceptions. On error return original object.
try_cast : bool, default None
Try to cast the result back to the input type (if possible).
.. deprecated:: 1.3.0
Manually cast back if necessary.
Returns
-------
Same type as caller or None if ``inplace=True``.
See Also
--------
:func:`DataFrame.where` : Return an object of same shape as
self.
Notes
-----
The mask method is an application of the if-then idiom. For each
element in the calling DataFrame, if ``cond`` is ``False`` the
element is used; otherwise the corresponding element from the DataFrame
``other`` is used.
The signature for :func:`DataFrame.where` differs from
:func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
``np.where(m, df1, df2)``.
For further details and examples see the ``mask`` documentation in
:ref:`indexing <indexing.where_mask>`.
Examples
--------
>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
>>> s.mask(s > 0)
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
>>> s.where(s > 1, 10)
0 10
1 10
2 2
3 3
4 4
dtype: int64
>>> s.mask(s > 1, 10)
0 0
1 1
2 10
3 10
4 10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> m = df % 3 == 0
>>> df.where(m, -df)
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == np.where(m, df, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
- max(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return the maximum of the values over the requested axis.
If you want the *index* of the maximum, use ``idxmax``. This is the equivalent of the ``numpy.ndarray`` method ``argmax``.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
scalar or Series (if level specified)
See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.sum : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.
Examples
--------
>>> idx = pd.MultiIndex.from_arrays([
... ['warm', 'warm', 'cold', 'cold'],
... ['dog', 'falcon', 'fish', 'spider']],
... names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded animal
warm dog 4
falcon 2
cold fish 0
spider 8
Name: legs, dtype: int64
>>> s.max()
8
- mean(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return the mean of the values over the requested axis.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
scalar or Series (if level specified)
- median(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return the median of the values over the requested axis.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
scalar or Series (if level specified)
- memory_usage(self, index: 'bool' = True, deep: 'bool' = False) -> 'int'
- Return the memory usage of the Series.
The memory usage can optionally include the contribution of
the index and of elements of `object` dtype.
Parameters
----------
index : bool, default True
Specifies whether to include the memory usage of the Series index.
deep : bool, default False
If True, introspect the data deeply by interrogating
`object` dtypes for system-level memory consumption, and include
it in the returned value.
Returns
-------
int
Bytes of memory consumed.
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.
DataFrame.memory_usage : Bytes consumed by a DataFrame.
Examples
--------
>>> s = pd.Series(range(3))
>>> s.memory_usage()
152
Not including the index gives the size of the rest of the data, which
is necessarily smaller:
>>> s.memory_usage(index=False)
24
The memory footprint of `object` values is ignored by default:
>>> s = pd.Series(["a", "b"])
>>> s.values
array(['a', 'b'], dtype=object)
>>> s.memory_usage()
144
>>> s.memory_usage(deep=True)
244
- min(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return the minimum of the values over the requested axis.
If you want the *index* of the minimum, use ``idxmin``. This is the equivalent of the ``numpy.ndarray`` method ``argmin``.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
scalar or Series (if level specified)
See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.sum : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.
Examples
--------
>>> idx = pd.MultiIndex.from_arrays([
... ['warm', 'warm', 'cold', 'cold'],
... ['dog', 'falcon', 'fish', 'spider']],
... names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded animal
warm dog 4
falcon 2
cold fish 0
spider 8
Name: legs, dtype: int64
>>> s.min()
0
- mod(self, other, level=None, fill_value=None, axis=0)
- Return Modulo of series and other, element-wise (binary operator `mod`).
Equivalent to ``series % other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.rmod : Reverse of the Modulo operator, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.mod(b, fill_value=0)
a 0.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64
- mode(self, dropna: 'bool' = True) -> 'Series'
- Return the mode(s) of the Series.
The mode is the value that appears most often. There can be multiple modes.
Always returns Series even if only one value is returned.
Parameters
----------
dropna : bool, default True
Don't consider counts of NaN/NaT.
Returns
-------
Series
Modes of the Series in sorted order.
- mul(self, other, level=None, fill_value=None, axis=0)
- Return Multiplication of series and other, element-wise (binary operator `mul`).
Equivalent to ``series * other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.rmul : Reverse of the Multiplication operator, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.multiply(b, fill_value=0)
a 1.0
b 0.0
c 0.0
d 0.0
e NaN
dtype: float64
- multiply = mul(self, other, level=None, fill_value=None, axis=0)
- ne(self, other, level=None, fill_value=None, axis=0)
- Return Not equal to of series and other, element-wise (binary operator `ne`).
Equivalent to ``series != other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.ne(b, fill_value=0)
a False
b True
c True
d True
e True
dtype: bool
- nlargest(self, n=5, keep='first') -> 'Series'
- Return the largest `n` elements.
Parameters
----------
n : int, default 5
Return this many descending sorted values.
keep : {'first', 'last', 'all'}, default 'first'
When there are duplicate values that cannot all fit in a
Series of `n` elements:
- ``first`` : return the first `n` occurrences in order
of appearance.
- ``last`` : return the last `n` occurrences in reverse
order of appearance.
- ``all`` : keep all occurrences. This can result in a Series of
size larger than `n`.
Returns
-------
Series
The `n` largest values in the Series, sorted in decreasing order.
See Also
--------
Series.nsmallest: Get the `n` smallest elements.
Series.sort_values: Sort Series by values.
Series.head: Return the first `n` rows.
Notes
-----
Faster than ``.sort_values(ascending=False).head(n)`` for small `n`
relative to the size of the ``Series`` object.
Examples
--------
>>> countries_population = {"Italy": 59000000, "France": 65000000,
... "Malta": 434000, "Maldives": 434000,
... "Brunei": 434000, "Iceland": 337000,
... "Nauru": 11300, "Tuvalu": 11300,
... "Anguilla": 11300, "Montserrat": 5200}
>>> s = pd.Series(countries_population)
>>> s
Italy 59000000
France 65000000
Malta 434000
Maldives 434000
Brunei 434000
Iceland 337000
Nauru 11300
Tuvalu 11300
Anguilla 11300
Montserrat 5200
dtype: int64
The `n` largest elements where ``n=5`` by default.
>>> s.nlargest()
France 65000000
Italy 59000000
Malta 434000
Maldives 434000
Brunei 434000
dtype: int64
The `n` largest elements where ``n=3``. Default `keep` value is 'first'
so Malta will be kept.
>>> s.nlargest(3)
France 65000000
Italy 59000000
Malta 434000
dtype: int64
The `n` largest elements where ``n=3`` and keeping the last duplicates.
Brunei will be kept since it is the last with value 434000 based on
the index order.
>>> s.nlargest(3, keep='last')
France 65000000
Italy 59000000
Brunei 434000
dtype: int64
The `n` largest elements where ``n=3`` with all duplicates kept. Note
that the returned Series has five elements due to the three duplicates.
>>> s.nlargest(3, keep='all')
France 65000000
Italy 59000000
Malta 434000
Maldives 434000
Brunei 434000
dtype: int64
- notna(self) -> 'Series'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to False
values.
Returns
-------
Series
Mask of bool values for each element in Series that
indicates whether an element is not an NA value.
See Also
--------
Series.notnull : Alias of notna.
Series.isna : Boolean inverse of notna.
Series.dropna : Omit axes labels with missing values.
notna : Top-level notna.
Examples
--------
Show which entries in a DataFrame are not NA.
>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
... born=[pd.NaT, pd.Timestamp('1939-05-27'),
... pd.Timestamp('1940-04-25')],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker']))
>>> df
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
>>> df.notna()
age born name toy
0 True False True False
1 True True True True
2 False True True True
Show which entries in a Series are not NA.
>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0 5.0
1 6.0
2 NaN
dtype: float64
>>> ser.notna()
0 True
1 True
2 False
dtype: bool
- notnull(self) -> 'Series'
- Series.notnull is an alias for Series.notna.
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to False
values.
Returns
-------
Series
Mask of bool values for each element in Series that
indicates whether an element is not an NA value.
See Also
--------
Series.notnull : Alias of notna.
Series.isna : Boolean inverse of notna.
Series.dropna : Omit axes labels with missing values.
notna : Top-level notna.
Examples
--------
Show which entries in a DataFrame are not NA.
>>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
... born=[pd.NaT, pd.Timestamp('1939-05-27'),
... pd.Timestamp('1940-04-25')],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker']))
>>> df
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
>>> df.notna()
age born name toy
0 True False True False
1 True True True True
2 False True True True
Show which entries in a Series are not NA.
>>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0 5.0
1 6.0
2 NaN
dtype: float64
>>> ser.notna()
0 True
1 True
2 False
dtype: bool
- nsmallest(self, n: 'int' = 5, keep: 'str' = 'first') -> 'Series'
- Return the smallest `n` elements.
Parameters
----------
n : int, default 5
Return this many ascending sorted values.
keep : {'first', 'last', 'all'}, default 'first'
When there are duplicate values that cannot all fit in a
Series of `n` elements:
- ``first`` : return the first `n` occurrences in order
of appearance.
- ``last`` : return the last `n` occurrences in reverse
order of appearance.
- ``all`` : keep all occurrences. This can result in a Series of
size larger than `n`.
Returns
-------
Series
The `n` smallest values in the Series, sorted in increasing order.
See Also
--------
Series.nlargest: Get the `n` largest elements.
Series.sort_values: Sort Series by values.
Series.head: Return the first `n` rows.
Notes
-----
Faster than ``.sort_values().head(n)`` for small `n` relative to
the size of the ``Series`` object.
Examples
--------
>>> countries_population = {"Italy": 59000000, "France": 65000000,
... "Brunei": 434000, "Malta": 434000,
... "Maldives": 434000, "Iceland": 337000,
... "Nauru": 11300, "Tuvalu": 11300,
... "Anguilla": 11300, "Montserrat": 5200}
>>> s = pd.Series(countries_population)
>>> s
Italy 59000000
France 65000000
Brunei 434000
Malta 434000
Maldives 434000
Iceland 337000
Nauru 11300
Tuvalu 11300
Anguilla 11300
Montserrat 5200
dtype: int64
The `n` smallest elements where ``n=5`` by default.
>>> s.nsmallest()
Montserrat 5200
Nauru 11300
Tuvalu 11300
Anguilla 11300
Iceland 337000
dtype: int64
The `n` smallest elements where ``n=3``. Default `keep` value is
'first' so Nauru and Tuvalu will be kept.
>>> s.nsmallest(3)
Montserrat 5200
Nauru 11300
Tuvalu 11300
dtype: int64
The `n` smallest elements where ``n=3`` and keeping the last
duplicates. Anguilla and Tuvalu will be kept since they are the last
with value 11300 based on the index order.
>>> s.nsmallest(3, keep='last')
Montserrat 5200
Anguilla 11300
Tuvalu 11300
dtype: int64
The `n` smallest elements where ``n=3`` with all duplicates kept. Note
that the returned Series has four elements due to the three duplicates.
>>> s.nsmallest(3, keep='all')
Montserrat 5200
Nauru 11300
Tuvalu 11300
Anguilla 11300
dtype: int64
- pop(self, item: 'Hashable') -> 'Any'
- Return item and drops from series. Raise KeyError if not found.
Parameters
----------
item : label
Index of the element that needs to be removed.
Returns
-------
Value that is popped from series.
Examples
--------
>>> ser = pd.Series([1,2,3])
>>> ser.pop(0)
1
>>> ser
1 2
2 3
dtype: int64
- pow(self, other, level=None, fill_value=None, axis=0)
- Return Exponential power of series and other, element-wise (binary operator `pow`).
Equivalent to ``series ** other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.rpow : Reverse of the Exponential power operator, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.pow(b, fill_value=0)
a 1.0
b 1.0
c 1.0
d 0.0
e NaN
dtype: float64
- prod(self, axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
- Return the product of the values over the requested axis.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
min_count : int, default 0
The required number of valid values to perform the operation. If fewer than
``min_count`` non-NA values are present the result will be NA.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
scalar or Series (if level specified)
See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.sum : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.
Examples
--------
By default, the product of an empty or all-NA Series is ``1``
>>> pd.Series([], dtype="float64").prod()
1.0
This can be controlled with the ``min_count`` parameter
>>> pd.Series([], dtype="float64").prod(min_count=1)
nan
Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and
empty series identically.
>>> pd.Series([np.nan]).prod()
1.0
>>> pd.Series([np.nan]).prod(min_count=1)
nan
- product = prod(self, axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
- quantile(self, q=0.5, interpolation='linear')
- Return value at the given quantile.
Parameters
----------
q : float or array-like, default 0.5 (50% quantile)
The quantile(s) to compute, which can lie in range: 0 <= q <= 1.
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
This optional parameter specifies the interpolation method to use,
when the desired quantile lies between two data points `i` and `j`:
* linear: `i + (j - i) * fraction`, where `fraction` is the
fractional part of the index surrounded by `i` and `j`.
* lower: `i`.
* higher: `j`.
* nearest: `i` or `j` whichever is nearest.
* midpoint: (`i` + `j`) / 2.
Returns
-------
float or Series
If ``q`` is an array, a Series will be returned where the
index is ``q`` and the values are the quantiles, otherwise
a float will be returned.
See Also
--------
core.window.Rolling.quantile : Calculate the rolling quantile.
numpy.percentile : Returns the q-th percentile(s) of the array elements.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4])
>>> s.quantile(.5)
2.5
>>> s.quantile([.25, .5, .75])
0.25 1.75
0.50 2.50
0.75 3.25
dtype: float64
- radd(self, other, level=None, fill_value=None, axis=0)
- Return Addition of series and other, element-wise (binary operator `radd`).
Equivalent to ``other + series``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.add : Element-wise Addition, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.add(b, fill_value=0)
a 2.0
b 1.0
c 1.0
d 1.0
e NaN
dtype: float64
- ravel(self, order='C')
- Return the flattened underlying data as an ndarray.
Returns
-------
numpy.ndarray or ndarray-like
Flattened data of the Series.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- rdiv = rtruediv(self, other, level=None, fill_value=None, axis=0)
- rdivmod(self, other, level=None, fill_value=None, axis=0)
- Return Integer division and modulo of series and other, element-wise (binary operator `rdivmod`).
Equivalent to ``other divmod series``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
2-Tuple of Series
The result of the operation.
See Also
--------
Series.divmod : Element-wise Integer division and modulo, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.divmod(b, fill_value=0)
(a 1.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64,
a 0.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64)
- reindex(self, *args, **kwargs) -> 'Series'
- Conform Series to new index with optional filling logic.
Places NA/NaN in locations having no value in the previous index. A new object
is produced unless the new index is equivalent to the current one and
``copy=False``.
Parameters
----------
index : array-like, optional
New labels / index to conform to, should be specified using
keywords. Preferably an Index object to avoid duplicating data.
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* None (default): don't fill gaps
* pad / ffill: Propagate last valid observation forward to next
valid.
* backfill / bfill: Use next valid observation to fill gap.
* nearest: Use nearest valid observations to fill gap.
copy : bool, default True
Return a new object, even if the passed indexes are the same.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any
"compatible" value.
limit : int, default None
Maximum number of consecutive elements to forward or backward fill.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
Series with changed index.
See Also
--------
DataFrame.set_index : Set row labels.
DataFrame.reset_index : Remove row labels or move them to new columns.
DataFrame.reindex_like : Change to same indices as other DataFrame.
Examples
--------
``DataFrame.reindex`` supports two calling conventions
* ``(index=index_labels, columns=column_labels, ...)``
* ``(labels, axis={'index', 'columns'}, ...)``
We *highly* recommend using keyword arguments to clarify your
intent.
Create a dataframe with some fictional data.
>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
... index=index)
>>> df
http_status response_time
Firefox 200 0.04
Chrome 200 0.02
Safari 404 0.07
IE10 404 0.08
Konqueror 301 1.00
Create a new index and reindex the dataframe. By default
values in the new index that do not have corresponding
records in the dataframe are assigned ``NaN``.
>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
... 'Chrome']
>>> df.reindex(new_index)
http_status response_time
Safari 404.0 0.07
Iceweasel NaN NaN
Comodo Dragon NaN NaN
IE10 404.0 0.08
Chrome 200.0 0.02
We can fill in the missing values by passing a value to
the keyword ``fill_value``. Because the index is not monotonically
increasing or decreasing, we cannot use arguments to the keyword
``method`` to fill the ``NaN`` values.
>>> df.reindex(new_index, fill_value=0)
http_status response_time
Safari 404 0.07
Iceweasel 0 0.00
Comodo Dragon 0 0.00
IE10 404 0.08
Chrome 200 0.02
>>> df.reindex(new_index, fill_value='missing')
http_status response_time
Safari 404 0.07
Iceweasel missing missing
Comodo Dragon missing missing
IE10 404 0.08
Chrome 200 0.02
We can also reindex the columns.
>>> df.reindex(columns=['http_status', 'user_agent'])
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN
Or we can use "axis-style" keyword arguments
>>> df.reindex(['http_status', 'user_agent'], axis="columns")
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN
To further illustrate the filling functionality in
``reindex``, we will create a dataframe with a
monotonically increasing index (for example, a sequence
of dates).
>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
... index=date_index)
>>> df2
prices
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
Suppose we decide to expand the dataframe to cover a wider
date range.
>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
prices
2009-12-29 NaN
2009-12-30 NaN
2009-12-31 NaN
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN
The index entries that did not have a value in the original data frame
(for example, '2009-12-29') are by default filled with ``NaN``.
If desired, we can fill in the missing values using one of several
options.
For example, to back-propagate the last valid value to fill the ``NaN``
values, pass ``bfill`` as an argument to the ``method`` keyword.
>>> df2.reindex(date_index2, method='bfill')
prices
2009-12-29 100.0
2009-12-30 100.0
2009-12-31 100.0
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN
Please note that the ``NaN`` value present in the original dataframe
(at index value 2010-01-03) will not be filled by any of the
value propagation schemes. This is because filling while reindexing
does not look at dataframe values, but only compares the original and
desired indexes. If you do want to fill in the ``NaN`` values present
in the original dataframe, use the ``fillna()`` method.
See the :ref:`user guide <basics.reindexing>` for more.
- rename(self, index=None, *, axis=None, copy=True, inplace=False, level=None, errors='ignore') -> 'Series | None'
- Alter Series index labels or name.
Function / dict values must be unique (1-to-1). Labels not contained in
a dict / Series will be left as-is. Extra labels listed don't throw an
error.
Alternatively, change ``Series.name`` with a scalar value.
See the :ref:`user guide <basics.rename>` for more.
Parameters
----------
axis : {0 or "index"}
Unused. Accepted for compatibility with DataFrame method only.
index : scalar, hashable sequence, dict-like or function, optional
Functions or dict-like are transformations to apply to
the index.
Scalar or hashable sequence-like will alter the ``Series.name``
attribute.
**kwargs
Additional keyword arguments passed to the function. Only the
"inplace" keyword is used.
Returns
-------
Series or None
Series with index labels or name altered or None if ``inplace=True``.
See Also
--------
DataFrame.rename : Corresponding DataFrame method.
Series.rename_axis : Set the name of the axis.
Examples
--------
>>> s = pd.Series([1, 2, 3])
>>> s
0 1
1 2
2 3
dtype: int64
>>> s.rename("my_name") # scalar, changes Series.name
0 1
1 2
2 3
Name: my_name, dtype: int64
>>> s.rename(lambda x: x ** 2) # function, changes labels
0 1
1 2
4 3
dtype: int64
>>> s.rename({1: 3, 2: 5}) # mapping, changes labels
0 1
3 2
5 3
dtype: int64
- reorder_levels(self, order) -> 'Series'
- Rearrange index levels using input order.
May not drop or duplicate levels.
Parameters
----------
order : list of int representing new level order
Reference level by number or key.
Returns
-------
type of caller (new object)
- repeat(self, repeats, axis=None) -> 'Series'
- Repeat elements of a Series.
Returns a new Series where each element of the current Series
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Series.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
Series
Newly created Series with repeated elements.
See Also
--------
Index.repeat : Equivalent function for Index.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> s = pd.Series(['a', 'b', 'c'])
>>> s
0 a
1 b
2 c
dtype: object
>>> s.repeat(2)
0 a
0 a
1 b
1 b
2 c
2 c
dtype: object
>>> s.repeat([1, 2, 3])
0 a
1 b
1 b
2 c
2 c
2 c
dtype: object
- replace(self, to_replace=None, value=<no_default>, inplace=False, limit=None, regex=False, method: 'str | lib.NoDefault' = <no_default>)
- Replace values given in `to_replace` with `value`.
Values of the Series are replaced with other values dynamically.
This differs from updating with ``.loc`` or ``.iloc``, which require
you to specify a location to update with some value.
Parameters
----------
to_replace : str, regex, list, dict, Series, int, float, or None
How to find the values that will be replaced.
* numeric, str or regex:
- numeric: numeric values equal to `to_replace` will be
replaced with `value`
- str: string exactly matching `to_replace` will be replaced
with `value`
- regex: regexs matching `to_replace` will be replaced with
`value`
* list of str, regex, or numeric:
- First, if `to_replace` and `value` are both lists, they
**must** be the same length.
- Second, if ``regex=True`` then all of the strings in **both**
lists will be interpreted as regexs otherwise they will match
directly. This doesn't matter much for `value` since there
are only a few possible substitution regexes you can use.
- str, regex and numeric rules apply as above.
* dict:
- Dicts can be used to specify different replacement values
for different existing values. For example,
``{'a': 'b', 'y': 'z'}`` replaces the value 'a' with 'b' and
'y' with 'z'. To use a dict in this way the `value`
parameter should be `None`.
- For a DataFrame a dict can specify that different values
should be replaced in different columns. For example,
``{'a': 1, 'b': 'z'}`` looks for the value 1 in column 'a'
and the value 'z' in column 'b' and replaces these values
with whatever is specified in `value`. The `value` parameter
should not be ``None`` in this case. You can treat this as a
special case of passing two lists except that you are
specifying the column to search in.
- For a DataFrame nested dictionaries, e.g.,
``{'a': {'b': np.nan}}``, are read as follows: look in column
'a' for the value 'b' and replace it with NaN. The `value`
parameter should be ``None`` to use a nested dict in this
way. You can nest regular expressions as well. Note that
column names (the top-level dictionary keys in a nested
dictionary) **cannot** be regular expressions.
* None:
- This means that the `regex` argument must be a string,
compiled regular expression, or list, dict, ndarray or
Series of such elements. If `value` is also ``None`` then
this **must** be a nested dictionary or Series.
See the examples section for examples of each of these.
value : scalar, dict, list, str, regex, default None
Value to replace any values matching `to_replace` with.
For a DataFrame a dict of values can be used to specify which
value to use for each column (columns not in the dict will not be
filled). Regular expressions, strings and lists or dicts of such
objects are also allowed.
inplace : bool, default False
If True, performs operation inplace and returns None.
limit : int, default None
Maximum size gap to forward or backward fill.
regex : bool or same types as `to_replace`, default False
Whether to interpret `to_replace` and/or `value` as regular
expressions. If this is ``True`` then `to_replace` *must* be a
string. Alternatively, this could be a regular expression or a
list, dict, or array of regular expressions in which case
`to_replace` must be ``None``.
method : {'pad', 'ffill', 'bfill', `None`}
The method to use when for replacement, when `to_replace` is a
scalar, list or tuple and `value` is ``None``.
.. versionchanged:: 0.23.0
Added to DataFrame.
Returns
-------
Series
Object after replacement.
Raises
------
AssertionError
* If `regex` is not a ``bool`` and `to_replace` is not
``None``.
TypeError
* If `to_replace` is not a scalar, array-like, ``dict``, or ``None``
* If `to_replace` is a ``dict`` and `value` is not a ``list``,
``dict``, ``ndarray``, or ``Series``
* If `to_replace` is ``None`` and `regex` is not compilable
into a regular expression or is a list, dict, ndarray, or
Series.
* When replacing multiple ``bool`` or ``datetime64`` objects and
the arguments to `to_replace` does not match the type of the
value being replaced
ValueError
* If a ``list`` or an ``ndarray`` is passed to `to_replace` and
`value` but they are not the same length.
See Also
--------
Series.fillna : Fill NA values.
Series.where : Replace values based on boolean condition.
Series.str.replace : Simple string replacement.
Notes
-----
* Regex substitution is performed under the hood with ``re.sub``. The
rules for substitution for ``re.sub`` are the same.
* Regular expressions will only substitute on strings, meaning you
cannot provide, for example, a regular expression matching floating
point numbers and expect the columns in your frame that have a
numeric dtype to be matched. However, if those floating point
numbers *are* strings, then you can do this.
* This method has *a lot* of options. You are encouraged to experiment
and play with this method to gain intuition about how it works.
* When dict is used as the `to_replace` value, it is like
key(s) in the dict are the to_replace part and
value(s) in the dict are the value parameter.
Examples
--------
**Scalar `to_replace` and `value`**
>>> s = pd.Series([1, 2, 3, 4, 5])
>>> s.replace(1, 5)
0 5
1 2
2 3
3 4
4 5
dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
... 'B': [5, 6, 7, 8, 9],
... 'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)
A B C
0 5 5 a
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 e
**List-like `to_replace`**
>>> df.replace([0, 1, 2, 3], 4)
A B C
0 4 5 a
1 4 6 b
2 4 7 c
3 4 8 d
4 4 9 e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
A B C
0 4 5 a
1 3 6 b
2 2 7 c
3 1 8 d
4 4 9 e
>>> s.replace([1, 2], method='bfill')
0 3
1 3
2 3
3 4
4 5
dtype: int64
**dict-like `to_replace`**
>>> df.replace({0: 10, 1: 100})
A B C
0 10 5 a
1 100 6 b
2 2 7 c
3 3 8 d
4 4 9 e
>>> df.replace({'A': 0, 'B': 5}, 100)
A B C
0 100 100 a
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 e
>>> df.replace({'A': {0: 100, 4: 400}})
A B C
0 100 5 a
1 1 6 b
2 2 7 c
3 3 8 d
4 400 9 e
**Regular expression `to_replace`**
>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
... 'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
A B
0 new abc
1 foo new
2 bait xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
A B
0 new abc
1 foo bar
2 bait xyz
>>> df.replace(regex=r'^ba.$', value='new')
A B
0 new abc
1 foo new
2 bait xyz
>>> df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})
A B
0 new abc
1 xyz new
2 bait xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
A B
0 new abc
1 new new
2 bait xyz
Compare the behavior of ``s.replace({'a': None})`` and
``s.replace('a', None)`` to understand the peculiarities
of the `to_replace` parameter:
>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
When one uses a dict as the `to_replace` value, it is like the
value(s) in the dict are equal to the `value` parameter.
``s.replace({'a': None})`` is equivalent to
``s.replace(to_replace={'a': None}, value=None, method=None)``:
>>> s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object
When ``value`` is not explicitly passed and `to_replace` is a scalar, list
or tuple, `replace` uses the method parameter (default 'pad') to do the
replacement. So this is why the 'a' values are being replaced by 10
in rows 1 and 2 and 'b' in row 4 in this case.
>>> s.replace('a')
0 10
1 10
2 10
3 b
4 b
dtype: object
On the other hand, if ``None`` is explicitly passed for ``value``, it will
be respected:
>>> s.replace('a', None)
0 10
1 None
2 None
3 b
4 None
dtype: object
.. versionchanged:: 1.4.0
Previously the explicit ``None`` was silently ignored.
- resample(self, rule, axis=0, closed: 'str | None' = None, label: 'str | None' = None, convention: 'str' = 'start', kind: 'str | None' = None, loffset=None, base: 'int | None' = None, on=None, level=None, origin: 'str | TimestampConvertibleTypes' = 'start_day', offset: 'TimedeltaConvertibleTypes | None' = None) -> 'Resampler'
- Resample time-series data.
Convenience method for frequency conversion and resampling of time series.
The object must have a datetime-like index (`DatetimeIndex`, `PeriodIndex`,
or `TimedeltaIndex`), or the caller must pass the label of a datetime-like
series/index to the ``on``/``level`` keyword parameter.
Parameters
----------
rule : DateOffset, Timedelta or str
The offset string or object representing target conversion.
axis : {0 or 'index', 1 or 'columns'}, default 0
Which axis to use for up- or down-sampling. For `Series` this
will default to 0, i.e. along the rows. Must be
`DatetimeIndex`, `TimedeltaIndex` or `PeriodIndex`.
closed : {'right', 'left'}, default None
Which side of bin interval is closed. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.
label : {'right', 'left'}, default None
Which bin edge label to label bucket with. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.
convention : {'start', 'end', 's', 'e'}, default 'start'
For `PeriodIndex` only, controls whether to use the start or
end of `rule`.
kind : {'timestamp', 'period'}, optional, default None
Pass 'timestamp' to convert the resulting index to a
`DateTimeIndex` or 'period' to convert it to a `PeriodIndex`.
By default the input representation is retained.
loffset : timedelta, default None
Adjust the resampled time labels.
.. deprecated:: 1.1.0
You should add the loffset to the `df.index` after the resample.
See below.
base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0.
.. deprecated:: 1.1.0
The new arguments that you should use are 'offset' or 'origin'.
on : str, optional
For a DataFrame, column to use instead of index for resampling.
Column must be datetime-like.
level : str or int, optional
For a MultiIndex, level (name or number) to use for
resampling. `level` must be datetime-like.
origin : Timestamp or str, default 'start_day'
The timestamp on which to adjust the grouping. The timezone of origin
must match the timezone of the index.
If string, must be one of the following:
- 'epoch': `origin` is 1970-01-01
- 'start': `origin` is the first value of the timeseries
- 'start_day': `origin` is the first day at midnight of the timeseries
.. versionadded:: 1.1.0
- 'end': `origin` is the last value of the timeseries
- 'end_day': `origin` is the ceiling midnight of the last day
.. versionadded:: 1.3.0
offset : Timedelta or str, default is None
An offset timedelta added to the origin.
.. versionadded:: 1.1.0
Returns
-------
pandas.core.Resampler
:class:`~pandas.core.Resampler` object.
See Also
--------
Series.resample : Resample a Series.
DataFrame.resample : Resample a DataFrame.
groupby : Group Series by mapping, function, label, or list of labels.
asfreq : Reindex a Series with the given frequency without grouping.
Notes
-----
See the `user guide
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling>`__
for more.
To learn more about the offset strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects>`__.
Examples
--------
Start by creating a series with 9 one minute timestamps.
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00 0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
2000-01-01 00:07:00 7
2000-01-01 00:08:00 8
Freq: T, dtype: int64
Downsample the series into 3 minute bins and sum the values
of the timestamps falling into a bin.
>>> series.resample('3T').sum()
2000-01-01 00:00:00 3
2000-01-01 00:03:00 12
2000-01-01 00:06:00 21
Freq: 3T, dtype: int64
Downsample the series into 3 minute bins as above, but label each
bin using the right edge instead of the left. Please note that the
value in the bucket used as the label is not included in the bucket,
which it labels. For example, in the original series the
bucket ``2000-01-01 00:03:00`` contains the value 3, but the summed
value in the resampled bucket with the label ``2000-01-01 00:03:00``
does not include 3 (if it did, the summed value would be 6, not 3).
To include this value close the right side of the bin interval as
illustrated in the example below this one.
>>> series.resample('3T', label='right').sum()
2000-01-01 00:03:00 3
2000-01-01 00:06:00 12
2000-01-01 00:09:00 21
Freq: 3T, dtype: int64
Downsample the series into 3 minute bins as above, but close the right
side of the bin interval.
>>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00 0
2000-01-01 00:03:00 6
2000-01-01 00:06:00 15
2000-01-01 00:09:00 15
Freq: 3T, dtype: int64
Upsample the series into 30 second bins.
>>> series.resample('30S').asfreq()[0:5] # Select first 5 rows
2000-01-01 00:00:00 0.0
2000-01-01 00:00:30 NaN
2000-01-01 00:01:00 1.0
2000-01-01 00:01:30 NaN
2000-01-01 00:02:00 2.0
Freq: 30S, dtype: float64
Upsample the series into 30 second bins and fill the ``NaN``
values using the ``pad`` method.
>>> series.resample('30S').pad()[0:5]
2000-01-01 00:00:00 0
2000-01-01 00:00:30 0
2000-01-01 00:01:00 1
2000-01-01 00:01:30 1
2000-01-01 00:02:00 2
Freq: 30S, dtype: int64
Upsample the series into 30 second bins and fill the
``NaN`` values using the ``bfill`` method.
>>> series.resample('30S').bfill()[0:5]
2000-01-01 00:00:00 0
2000-01-01 00:00:30 1
2000-01-01 00:01:00 1
2000-01-01 00:01:30 2
2000-01-01 00:02:00 2
Freq: 30S, dtype: int64
Pass a custom function via ``apply``
>>> def custom_resampler(arraylike):
... return np.sum(arraylike) + 5
...
>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00 8
2000-01-01 00:03:00 17
2000-01-01 00:06:00 26
Freq: 3T, dtype: int64
For a Series with a PeriodIndex, the keyword `convention` can be
used to control whether to use the start or end of `rule`.
Resample a year by quarter using 'start' `convention`. Values are
assigned to the first quarter of the period.
>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
... freq='A',
... periods=2))
>>> s
2012 1
2013 2
Freq: A-DEC, dtype: int64
>>> s.resample('Q', convention='start').asfreq()
2012Q1 1.0
2012Q2 NaN
2012Q3 NaN
2012Q4 NaN
2013Q1 2.0
2013Q2 NaN
2013Q3 NaN
2013Q4 NaN
Freq: Q-DEC, dtype: float64
Resample quarters by month using 'end' `convention`. Values are
assigned to the last month of the period.
>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',
... freq='Q',
... periods=4))
>>> q
2018Q1 1
2018Q2 2
2018Q3 3
2018Q4 4
Freq: Q-DEC, dtype: int64
>>> q.resample('M', convention='end').asfreq()
2018-03 1.0
2018-04 NaN
2018-05 NaN
2018-06 2.0
2018-07 NaN
2018-08 NaN
2018-09 3.0
2018-10 NaN
2018-11 NaN
2018-12 4.0
Freq: M, dtype: float64
For DataFrame objects, the keyword `on` can be used to specify the
column instead of the index for resampling.
>>> d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df = pd.DataFrame(d)
>>> df['week_starting'] = pd.date_range('01/01/2018',
... periods=8,
... freq='W')
>>> df
price volume week_starting
0 10 50 2018-01-07
1 11 60 2018-01-14
2 9 40 2018-01-21
3 13 100 2018-01-28
4 14 50 2018-02-04
5 18 100 2018-02-11
6 17 40 2018-02-18
7 19 50 2018-02-25
>>> df.resample('M', on='week_starting').mean()
price volume
week_starting
2018-01-31 10.75 62.5
2018-02-28 17.00 60.0
For a DataFrame with MultiIndex, the keyword `level` can be used to
specify on which level the resampling needs to take place.
>>> days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df2 = pd.DataFrame(
... d2,
... index=pd.MultiIndex.from_product(
... [days, ['morning', 'afternoon']]
... )
... )
>>> df2
price volume
2000-01-01 morning 10 50
afternoon 11 60
2000-01-02 morning 9 40
afternoon 13 100
2000-01-03 morning 14 50
afternoon 18 100
2000-01-04 morning 17 40
afternoon 19 50
>>> df2.resample('D', level=0).sum()
price volume
2000-01-01 21 110
2000-01-02 22 140
2000-01-03 32 150
2000-01-04 36 90
If you want to adjust the start of the bins based on a fixed timestamp:
>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00 0
2000-10-01 23:37:00 3
2000-10-01 23:44:00 6
2000-10-01 23:51:00 9
2000-10-01 23:58:00 12
2000-10-02 00:05:00 15
2000-10-02 00:12:00 18
2000-10-02 00:19:00 21
2000-10-02 00:26:00 24
Freq: 7T, dtype: int64
>>> ts.resample('17min').sum()
2000-10-01 23:14:00 0
2000-10-01 23:31:00 9
2000-10-01 23:48:00 21
2000-10-02 00:05:00 54
2000-10-02 00:22:00 24
Freq: 17T, dtype: int64
>>> ts.resample('17min', origin='epoch').sum()
2000-10-01 23:18:00 0
2000-10-01 23:35:00 18
2000-10-01 23:52:00 27
2000-10-02 00:09:00 39
2000-10-02 00:26:00 24
Freq: 17T, dtype: int64
>>> ts.resample('17min', origin='2000-01-01').sum()
2000-10-01 23:24:00 3
2000-10-01 23:41:00 15
2000-10-01 23:58:00 45
2000-10-02 00:15:00 45
Freq: 17T, dtype: int64
If you want to adjust the start of the bins with an `offset` Timedelta, the two
following lines are equivalent:
>>> ts.resample('17min', origin='start').sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
>>> ts.resample('17min', offset='23h30min').sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
If you want to take the largest Timestamp as the end of the bins:
>>> ts.resample('17min', origin='end').sum()
2000-10-01 23:35:00 0
2000-10-01 23:52:00 18
2000-10-02 00:09:00 27
2000-10-02 00:26:00 63
Freq: 17T, dtype: int64
In contrast with the `start_day`, you can use `end_day` to take the ceiling
midnight of the largest Timestamp as the end of the bins and drop the bins
not containing data:
>>> ts.resample('17min', origin='end_day').sum()
2000-10-01 23:38:00 3
2000-10-01 23:55:00 15
2000-10-02 00:12:00 45
2000-10-02 00:29:00 45
Freq: 17T, dtype: int64
To replace the use of the deprecated `base` argument, you can now use `offset`,
in this example it is equivalent to have `base=2`:
>>> ts.resample('17min', offset='2min').sum()
2000-10-01 23:16:00 0
2000-10-01 23:33:00 9
2000-10-01 23:50:00 36
2000-10-02 00:07:00 39
2000-10-02 00:24:00 24
Freq: 17T, dtype: int64
To replace the use of the deprecated `loffset` argument:
>>> from pandas.tseries.frequencies import to_offset
>>> loffset = '19min'
>>> ts_out = ts.resample('17min').sum()
>>> ts_out.index = ts_out.index + to_offset(loffset)
>>> ts_out
2000-10-01 23:33:00 0
2000-10-01 23:50:00 9
2000-10-02 00:07:00 21
2000-10-02 00:24:00 54
2000-10-02 00:41:00 24
Freq: 17T, dtype: int64
- reset_index(self, level=None, drop=False, name=<no_default>, inplace=False)
- Generate a new DataFrame or Series with the index reset.
This is useful when the index needs to be treated as a column, or
when the index is meaningless and needs to be reset to the default
before another operation.
Parameters
----------
level : int, str, tuple, or list, default optional
For a Series with a MultiIndex, only remove the specified levels
from the index. Removes all levels by default.
drop : bool, default False
Just reset the index, without inserting it as a column in
the new DataFrame.
name : object, optional
The name to use for the column containing the original Series
values. Uses ``self.name`` by default. This argument is ignored
when `drop` is True.
inplace : bool, default False
Modify the Series in place (do not create a new object).
Returns
-------
Series or DataFrame or None
When `drop` is False (the default), a DataFrame is returned.
The newly created columns will come first in the DataFrame,
followed by the original Series values.
When `drop` is True, a `Series` is returned.
In either case, if ``inplace=True``, no value is returned.
See Also
--------
DataFrame.reset_index: Analogous function for DataFrame.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4], name='foo',
... index=pd.Index(['a', 'b', 'c', 'd'], name='idx'))
Generate a DataFrame with default index.
>>> s.reset_index()
idx foo
0 a 1
1 b 2
2 c 3
3 d 4
To specify the name of the new column use `name`.
>>> s.reset_index(name='values')
idx values
0 a 1
1 b 2
2 c 3
3 d 4
To generate a new Series with the default set `drop` to True.
>>> s.reset_index(drop=True)
0 1
1 2
2 3
3 4
Name: foo, dtype: int64
To update the Series in place, without generating a new one
set `inplace` to True. Note that it also requires ``drop=True``.
>>> s.reset_index(inplace=True, drop=True)
>>> s
0 1
1 2
2 3
3 4
Name: foo, dtype: int64
The `level` parameter is interesting for Series with a multi-level
index.
>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz']),
... np.array(['one', 'two', 'one', 'two'])]
>>> s2 = pd.Series(
... range(4), name='foo',
... index=pd.MultiIndex.from_arrays(arrays,
... names=['a', 'b']))
To remove a specific level from the Index, use `level`.
>>> s2.reset_index(level='a')
a foo
b
one bar 0
two bar 1
one baz 2
two baz 3
If `level` is not set, all levels are removed from the Index.
>>> s2.reset_index()
a b foo
0 bar one 0
1 bar two 1
2 baz one 2
3 baz two 3
- rfloordiv(self, other, level=None, fill_value=None, axis=0)
- Return Integer division of series and other, element-wise (binary operator `rfloordiv`).
Equivalent to ``other // series``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.floordiv : Element-wise Integer division, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.floordiv(b, fill_value=0)
a 1.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64
- rmod(self, other, level=None, fill_value=None, axis=0)
- Return Modulo of series and other, element-wise (binary operator `rmod`).
Equivalent to ``other % series``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.mod : Element-wise Modulo, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.mod(b, fill_value=0)
a 0.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64
- rmul(self, other, level=None, fill_value=None, axis=0)
- Return Multiplication of series and other, element-wise (binary operator `rmul`).
Equivalent to ``other * series``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.mul : Element-wise Multiplication, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.multiply(b, fill_value=0)
a 1.0
b 0.0
c 0.0
d 0.0
e NaN
dtype: float64
- round(self, decimals=0, *args, **kwargs) -> 'Series'
- Round each value in a Series to the given number of decimals.
Parameters
----------
decimals : int, default 0
Number of decimal places to round to. If decimals is negative,
it specifies the number of positions to the left of the decimal point.
*args, **kwargs
Additional arguments and keywords have no effect but might be
accepted for compatibility with NumPy.
Returns
-------
Series
Rounded values of the Series.
See Also
--------
numpy.around : Round values of an np.array.
DataFrame.round : Round values of a DataFrame.
Examples
--------
>>> s = pd.Series([0.1, 1.3, 2.7])
>>> s.round()
0 0.0
1 1.0
2 3.0
dtype: float64
- rpow(self, other, level=None, fill_value=None, axis=0)
- Return Exponential power of series and other, element-wise (binary operator `rpow`).
Equivalent to ``other ** series``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.pow : Element-wise Exponential power, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.pow(b, fill_value=0)
a 1.0
b 1.0
c 1.0
d 0.0
e NaN
dtype: float64
- rsub(self, other, level=None, fill_value=None, axis=0)
- Return Subtraction of series and other, element-wise (binary operator `rsub`).
Equivalent to ``other - series``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.sub : Element-wise Subtraction, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.subtract(b, fill_value=0)
a 0.0
b 1.0
c 1.0
d -1.0
e NaN
dtype: float64
- rtruediv(self, other, level=None, fill_value=None, axis=0)
- Return Floating division of series and other, element-wise (binary operator `rtruediv`).
Equivalent to ``other / series``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.truediv : Element-wise Floating division, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.divide(b, fill_value=0)
a 1.0
b inf
c inf
d 0.0
e NaN
dtype: float64
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted Series `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
.. note::
The Series *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.
Parameters
----------
value : array-like or scalar
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort `self` into ascending
order. They are typically the result of ``np.argsort``.
Returns
-------
int or array of int
A scalar or array of insertion points with the
same shape as `value`.
See Also
--------
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.
Notes
-----
Binary search is used to find the required insertion points.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> ser
0 1
1 2
2 3
dtype: int64
>>> ser.searchsorted(4)
3
>>> ser.searchsorted([0, 4])
array([0, 3])
>>> ser.searchsorted([1, 3], side='left')
array([0, 2])
>>> ser.searchsorted([1, 3], side='right')
array([1, 3])
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
>>> ser.searchsorted('3/14/2000')
3
>>> ser = pd.Categorical(
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']
>>> ser.searchsorted('bread')
1
>>> ser.searchsorted(['bread'], side='right')
array([3])
If the values are not monotonically sorted, wrong locations
may be returned:
>>> ser = pd.Series([2, 1, 3])
>>> ser
0 2
1 1
2 3
dtype: int64
>>> ser.searchsorted(1) # doctest: +SKIP
0 # wrong result, correct would be 1
- sem(self, axis=None, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)
- Return unbiased standard error of the mean over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument
Parameters
----------
axis : {index (0)}
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Returns
-------
scalar or Series (if level specified)
- set_axis(self, labels, axis: 'Axis' = 0, inplace: 'bool' = False)
- Assign desired index to given axis.
Indexes for row labels can be changed by assigning
a list-like or Index.
Parameters
----------
labels : list-like, Index
The values for the new index.
axis : {0 or 'index'}, default 0
The axis to update. The value 0 identifies the rows.
inplace : bool, default False
Whether to return a new Series instance.
Returns
-------
renamed : Series or None
An object of type Series or None if ``inplace=True``.
See Also
--------
Series.rename_axis : Alter the name of the index.
Examples
--------
>>> s = pd.Series([1, 2, 3])
>>> s
0 1
1 2
2 3
dtype: int64
>>> s.set_axis(['a', 'b', 'c'], axis=0)
a 1
b 2
c 3
dtype: int64
- shift(self, periods=1, freq=None, axis=0, fill_value=None) -> 'Series'
- Shift index by desired number of periods with an optional time `freq`.
When `freq` is not passed, shift the index without realigning the data.
If `freq` is passed (in this case, the index must be date or datetime,
or it will raise a `NotImplementedError`), the index will be
increased using the periods and the `freq`. `freq` can be inferred
when specified as "infer" as long as either freq or inferred_freq
attribute is set in the index.
Parameters
----------
periods : int
Number of periods to shift. Can be positive or negative.
freq : DateOffset, tseries.offsets, timedelta, or str, optional
Offset to use from the tseries module or time rule (e.g. 'EOM').
If `freq` is specified then the index values are shifted but the
data is not realigned. That is, use `freq` if you would like to
extend the index when shifting and preserve the original data.
If `freq` is specified as "infer" then it will be inferred from
the freq or inferred_freq attributes of the index. If neither of
those attributes exist, a ValueError is thrown.
axis : {0 or 'index', 1 or 'columns', None}, default None
Shift direction.
fill_value : object, optional
The scalar value to use for newly introduced missing values.
the default depends on the dtype of `self`.
For numeric data, ``np.nan`` is used.
For datetime, timedelta, or period data, etc. :attr:`NaT` is used.
For extension dtypes, ``self.dtype.na_value`` is used.
.. versionchanged:: 1.1.0
Returns
-------
Series
Copy of input object, shifted.
See Also
--------
Index.shift : Shift values of Index.
DatetimeIndex.shift : Shift values of DatetimeIndex.
PeriodIndex.shift : Shift values of PeriodIndex.
tshift : Shift the time index, using the index's frequency if
available.
Examples
--------
>>> df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
... "Col2": [13, 23, 18, 33, 48],
... "Col3": [17, 27, 22, 37, 52]},
... index=pd.date_range("2020-01-01", "2020-01-05"))
>>> df
Col1 Col2 Col3
2020-01-01 10 13 17
2020-01-02 20 23 27
2020-01-03 15 18 22
2020-01-04 30 33 37
2020-01-05 45 48 52
>>> df.shift(periods=3)
Col1 Col2 Col3
2020-01-01 NaN NaN NaN
2020-01-02 NaN NaN NaN
2020-01-03 NaN NaN NaN
2020-01-04 10.0 13.0 17.0
2020-01-05 20.0 23.0 27.0
>>> df.shift(periods=1, axis="columns")
Col1 Col2 Col3
2020-01-01 NaN 10 13
2020-01-02 NaN 20 23
2020-01-03 NaN 15 18
2020-01-04 NaN 30 33
2020-01-05 NaN 45 48
>>> df.shift(periods=3, fill_value=0)
Col1 Col2 Col3
2020-01-01 0 0 0
2020-01-02 0 0 0
2020-01-03 0 0 0
2020-01-04 10 13 17
2020-01-05 20 23 27
>>> df.shift(periods=3, freq="D")
Col1 Col2 Col3
2020-01-04 10 13 17
2020-01-05 20 23 27
2020-01-06 15 18 22
2020-01-07 30 33 37
2020-01-08 45 48 52
>>> df.shift(periods=3, freq="infer")
Col1 Col2 Col3
2020-01-04 10 13 17
2020-01-05 20 23 27
2020-01-06 15 18 22
2020-01-07 30 33 37
2020-01-08 45 48 52
- skew(self, axis: 'int | None | lib.NoDefault' = <no_default>, skipna=True, level=None, numeric_only=None, **kwargs)
- Return unbiased skew over requested axis.
Normalized by N-1.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
scalar or Series (if level specified)
- sort_index(self, axis=0, level=None, ascending: 'bool | int | Sequence[bool | int]' = True, inplace: 'bool' = False, kind: 'str' = 'quicksort', na_position: 'str' = 'last', sort_remaining: 'bool' = True, ignore_index: 'bool' = False, key: 'IndexKeyFunc' = None)
- Sort Series by index labels.
Returns a new Series sorted by label if `inplace` argument is
``False``, otherwise updates the original series and returns None.
Parameters
----------
axis : int, default 0
Axis to direct sorting. This can only be 0 for Series.
level : int, optional
If not None, sort on values in specified index level(s).
ascending : bool or list-like of bools, default True
Sort ascending vs. descending. When the index is a MultiIndex the
sort direction can be controlled for each level individually.
inplace : bool, default False
If True, perform operation in-place.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'
Choice of sorting algorithm. See also :func:`numpy.sort` for more
information. 'mergesort' and 'stable' are the only stable algorithms. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position : {'first', 'last'}, default 'last'
If 'first' puts NaNs at the beginning, 'last' puts NaNs at the end.
Not implemented for MultiIndex.
sort_remaining : bool, default True
If True and sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified level.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.0.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
Series or None
The original Series sorted by the labels or None if ``inplace=True``.
See Also
--------
DataFrame.sort_index: Sort DataFrame by the index.
DataFrame.sort_values: Sort DataFrame by the value.
Series.sort_values : Sort Series by the value.
Examples
--------
>>> s = pd.Series(['a', 'b', 'c', 'd'], index=[3, 2, 1, 4])
>>> s.sort_index()
1 c
2 b
3 a
4 d
dtype: object
Sort Descending
>>> s.sort_index(ascending=False)
4 d
3 a
2 b
1 c
dtype: object
Sort Inplace
>>> s.sort_index(inplace=True)
>>> s
1 c
2 b
3 a
4 d
dtype: object
By default NaNs are put at the end, but use `na_position` to place
them at the beginning
>>> s = pd.Series(['a', 'b', 'c', 'd'], index=[3, 2, 1, np.nan])
>>> s.sort_index(na_position='first')
NaN d
1.0 c
2.0 b
3.0 a
dtype: object
Specify index level to sort
>>> arrays = [np.array(['qux', 'qux', 'foo', 'foo',
... 'baz', 'baz', 'bar', 'bar']),
... np.array(['two', 'one', 'two', 'one',
... 'two', 'one', 'two', 'one'])]
>>> s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=arrays)
>>> s.sort_index(level=1)
bar one 8
baz one 6
foo one 4
qux one 2
bar two 7
baz two 5
foo two 3
qux two 1
dtype: int64
Does not sort by remaining levels when sorting by levels
>>> s.sort_index(level=1, sort_remaining=False)
qux one 2
foo one 4
baz one 6
bar one 8
qux two 1
foo two 3
baz two 5
bar two 7
dtype: int64
Apply a key function before sorting
>>> s = pd.Series([1, 2, 3, 4], index=['A', 'b', 'C', 'd'])
>>> s.sort_index(key=lambda x : x.str.lower())
A 1
b 2
C 3
d 4
dtype: int64
- sort_values(self, axis=0, ascending: 'bool | int | Sequence[bool | int]' = True, inplace: 'bool' = False, kind: 'str' = 'quicksort', na_position: 'str' = 'last', ignore_index: 'bool' = False, key: 'ValueKeyFunc' = None)
- Sort by the values.
Sort a Series in ascending or descending order by some
criterion.
Parameters
----------
axis : {0 or 'index'}, default 0
Axis to direct sorting. The value 'index' is accepted for
compatibility with DataFrame.sort_values.
ascending : bool or list of bools, default True
If True, sort values in ascending order, otherwise descending.
inplace : bool, default False
If True, perform operation in-place.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, default 'quicksort'
Choice of sorting algorithm. See also :func:`numpy.sort` for more
information. 'mergesort' and 'stable' are the only stable algorithms.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.0.0
key : callable, optional
If not None, apply the key function to the series values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect a
``Series`` and return an array-like.
.. versionadded:: 1.1.0
Returns
-------
Series or None
Series ordered by values or None if ``inplace=True``.
See Also
--------
Series.sort_index : Sort by the Series indices.
DataFrame.sort_values : Sort DataFrame by the values along either axis.
DataFrame.sort_index : Sort DataFrame by indices.
Examples
--------
>>> s = pd.Series([np.nan, 1, 3, 10, 5])
>>> s
0 NaN
1 1.0
2 3.0
3 10.0
4 5.0
dtype: float64
Sort values ascending order (default behaviour)
>>> s.sort_values(ascending=True)
1 1.0
2 3.0
4 5.0
3 10.0
0 NaN
dtype: float64
Sort values descending order
>>> s.sort_values(ascending=False)
3 10.0
4 5.0
2 3.0
1 1.0
0 NaN
dtype: float64
Sort values inplace
>>> s.sort_values(ascending=False, inplace=True)
>>> s
3 10.0
4 5.0
2 3.0
1 1.0
0 NaN
dtype: float64
Sort values putting NAs first
>>> s.sort_values(na_position='first')
0 NaN
1 1.0
2 3.0
4 5.0
3 10.0
dtype: float64
Sort a series of strings
>>> s = pd.Series(['z', 'b', 'd', 'a', 'c'])
>>> s
0 z
1 b
2 d
3 a
4 c
dtype: object
>>> s.sort_values()
3 a
1 b
4 c
2 d
0 z
dtype: object
Sort using a key function. Your `key` function will be
given the ``Series`` of values and should return an array-like.
>>> s = pd.Series(['a', 'B', 'c', 'D', 'e'])
>>> s.sort_values()
1 B
3 D
0 a
2 c
4 e
dtype: object
>>> s.sort_values(key=lambda x: x.str.lower())
0 a
1 B
2 c
3 D
4 e
dtype: object
NumPy ufuncs work well here. For example, we can
sort by the ``sin`` of the value
>>> s = pd.Series([-4, -2, 0, 2, 4])
>>> s.sort_values(key=np.sin)
1 -2
4 4
2 0
0 -4
3 2
dtype: int64
More complicated user-defined functions can be used,
as long as they expect a Series and return an array-like
>>> s.sort_values(key=lambda x: (np.tan(x.cumsum())))
0 -4
3 2
4 4
1 -2
2 0
dtype: int64
- std(self, axis=None, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)
- Return sample standard deviation over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
Parameters
----------
axis : {index (0)}
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Returns
-------
scalar or Series (if level specified)
Notes
-----
To have the same behaviour as `numpy.std`, use `ddof=0` (instead of the
default `ddof=1`)
Examples
--------
>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
... 'age': [21, 25, 62, 43],
... 'height': [1.61, 1.87, 1.49, 2.01]}
... ).set_index('person_id')
>>> df
age height
person_id
0 21 1.61
1 25 1.87
2 62 1.49
3 43 2.01
The standard deviation of the columns can be found as follows:
>>> df.std()
age 18.786076
height 0.237417
Alternatively, `ddof=0` can be set to normalize by N instead of N-1:
>>> df.std(ddof=0)
age 16.269219
height 0.205609
- sub(self, other, level=None, fill_value=None, axis=0)
- Return Subtraction of series and other, element-wise (binary operator `sub`).
Equivalent to ``series - other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.rsub : Reverse of the Subtraction operator, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.subtract(b, fill_value=0)
a 0.0
b 1.0
c 1.0
d -1.0
e NaN
dtype: float64
- subtract = sub(self, other, level=None, fill_value=None, axis=0)
- sum(self, axis=None, skipna=True, level=None, numeric_only=None, min_count=0, **kwargs)
- Return the sum of the values over the requested axis.
This is equivalent to the method ``numpy.sum``.
Parameters
----------
axis : {index (0)}
Axis for the function to be applied on.
skipna : bool, default True
Exclude NA/null values when computing the result.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
min_count : int, default 0
The required number of valid values to perform the operation. If fewer than
``min_count`` non-NA values are present the result will be NA.
**kwargs
Additional keyword arguments to be passed to the function.
Returns
-------
scalar or Series (if level specified)
See Also
--------
Series.sum : Return the sum.
Series.min : Return the minimum.
Series.max : Return the maximum.
Series.idxmin : Return the index of the minimum.
Series.idxmax : Return the index of the maximum.
DataFrame.sum : Return the sum over the requested axis.
DataFrame.min : Return the minimum over the requested axis.
DataFrame.max : Return the maximum over the requested axis.
DataFrame.idxmin : Return the index of the minimum over the requested axis.
DataFrame.idxmax : Return the index of the maximum over the requested axis.
Examples
--------
>>> idx = pd.MultiIndex.from_arrays([
... ['warm', 'warm', 'cold', 'cold'],
... ['dog', 'falcon', 'fish', 'spider']],
... names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded animal
warm dog 4
falcon 2
cold fish 0
spider 8
Name: legs, dtype: int64
>>> s.sum()
14
By default, the sum of an empty or all-NA Series is ``0``.
>>> pd.Series([], dtype="float64").sum() # min_count=0 is the default
0.0
This can be controlled with the ``min_count`` parameter. For example, if
you'd like the sum of an empty series to be NaN, pass ``min_count=1``.
>>> pd.Series([], dtype="float64").sum(min_count=1)
nan
Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and
empty series identically.
>>> pd.Series([np.nan]).sum()
0.0
>>> pd.Series([np.nan]).sum(min_count=1)
nan
- swaplevel(self, i=-2, j=-1, copy=True) -> 'Series'
- Swap levels i and j in a :class:`MultiIndex`.
Default is to swap the two innermost levels of the index.
Parameters
----------
i, j : int or str
Levels of the indices to be swapped. Can pass level name as string.
copy : bool, default True
Whether to copy underlying data.
Returns
-------
Series
Series with levels swapped in MultiIndex.
Examples
--------
>>> s = pd.Series(
... ["A", "B", "A", "C"],
... index=[
... ["Final exam", "Final exam", "Coursework", "Coursework"],
... ["History", "Geography", "History", "Geography"],
... ["January", "February", "March", "April"],
... ],
... )
>>> s
Final exam History January A
Geography February B
Coursework History March A
Geography April C
dtype: object
In the following example, we will swap the levels of the indices.
Here, we will swap the levels column-wise, but levels can be swapped row-wise
in a similar manner. Note that column-wise is the default behaviour.
By not supplying any arguments for i and j, we swap the last and second to
last indices.
>>> s.swaplevel()
Final exam January History A
February Geography B
Coursework March History A
April Geography C
dtype: object
By supplying one argument, we can choose which index to swap the last
index with. We can for example swap the first index with the last one as
follows.
>>> s.swaplevel(0)
January History Final exam A
February Geography Final exam B
March History Coursework A
April Geography Coursework C
dtype: object
We can also define explicitly which indices we want to swap by supplying values
for both i and j. Here, we for example swap the first and second indices.
>>> s.swaplevel(0, 1)
History Final exam January A
Geography Final exam February B
History Coursework March A
Geography Coursework April C
dtype: object
- take(self, indices, axis=0, is_copy=None, **kwargs) -> 'Series'
- Return the elements in the given *positional* indices along an axis.
This means that we are not indexing according to actual values in
the index attribute of the object. We are indexing according to the
actual position of the element in the object.
Parameters
----------
indices : array-like
An array of ints indicating which positions to take.
axis : {0 or 'index', 1 or 'columns', None}, default 0
The axis on which to select elements. ``0`` means that we are
selecting rows, ``1`` means that we are selecting columns.
is_copy : bool
Before pandas 1.0, ``is_copy=False`` can be specified to ensure
that the return value is an actual copy. Starting with pandas 1.0,
``take`` always returns a copy, and the keyword is therefore
deprecated.
.. deprecated:: 1.0.0
**kwargs
For compatibility with :meth:`numpy.take`. Has no effect on the
output.
Returns
-------
taken : same type as caller
An array-like containing the elements taken from the object.
See Also
--------
DataFrame.loc : Select a subset of a DataFrame by labels.
DataFrame.iloc : Select a subset of a DataFrame by positions.
numpy.take : Take elements from an array along an axis.
Examples
--------
>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', np.nan)],
... columns=['name', 'class', 'max_speed'],
... index=[0, 2, 3, 1])
>>> df
name class max_speed
0 falcon bird 389.0
2 parrot bird 24.0
3 lion mammal 80.5
1 monkey mammal NaN
Take elements at positions 0 and 3 along the axis 0 (default).
Note how the actual indices selected (0 and 1) do not correspond to
our selected indices 0 and 3. That's because we are selecting the 0th
and 3rd rows, not rows whose indices equal 0 and 3.
>>> df.take([0, 3])
name class max_speed
0 falcon bird 389.0
1 monkey mammal NaN
Take elements at indices 1 and 2 along the axis 1 (column selection).
>>> df.take([1, 2], axis=1)
class max_speed
0 bird 389.0
2 bird 24.0
3 mammal 80.5
1 mammal NaN
We may take elements using negative integers for positive indices,
starting from the end of the object, just like with Python lists.
>>> df.take([-1, -2])
name class max_speed
1 monkey mammal NaN
3 lion mammal 80.5
- to_dict(self, into=<class 'dict'>)
- Convert Series to {label -> value} dict or dict-like object.
Parameters
----------
into : class, default dict
The collections.abc.Mapping subclass to use as the return
object. Can be the actual class or an empty
instance of the mapping type you want. If you want a
collections.defaultdict, you must pass it initialized.
Returns
-------
collections.abc.Mapping
Key-value representation of Series.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4])
>>> s.to_dict()
{0: 1, 1: 2, 2: 3, 3: 4}
>>> from collections import OrderedDict, defaultdict
>>> s.to_dict(OrderedDict)
OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)])
>>> dd = defaultdict(list)
>>> s.to_dict(dd)
defaultdict(<class 'list'>, {0: 1, 1: 2, 2: 3, 3: 4})
- to_frame(self, name: 'Hashable' = <no_default>) -> 'DataFrame'
- Convert Series to DataFrame.
Parameters
----------
name : object, optional
The passed name should substitute for the series name (if it has
one).
Returns
-------
DataFrame
DataFrame representation of Series.
Examples
--------
>>> s = pd.Series(["a", "b", "c"],
... name="vals")
>>> s.to_frame()
vals
0 a
1 b
2 c
- to_markdown(self, buf: 'IO[str] | None' = None, mode: 'str' = 'wt', index: 'bool' = True, storage_options: 'StorageOptions' = None, **kwargs) -> 'str | None'
- Print Series in Markdown-friendly format.
.. versionadded:: 1.0.0
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
mode : str, optional
Mode in which file is opened, "wt" by default.
index : bool, optional, default True
Add index (row) labels.
.. versionadded:: 1.1.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
**kwargs
These parameters will be passed to `tabulate <https://pypi.org/project/tabulate>`_.
Returns
-------
str
Series in Markdown-friendly format.
Notes
-----
Requires the `tabulate <https://pypi.org/project/tabulate>`_ package.
Examples
--------
>>> s = pd.Series(["elk", "pig", "dog", "quetzal"], name="animal")
>>> print(s.to_markdown())
| | animal |
|---:|:---------|
| 0 | elk |
| 1 | pig |
| 2 | dog |
| 3 | quetzal |
Output markdown with a tabulate option.
>>> print(s.to_markdown(tablefmt="grid"))
+----+----------+
| | animal |
+====+==========+
| 0 | elk |
+----+----------+
| 1 | pig |
+----+----------+
| 2 | dog |
+----+----------+
| 3 | quetzal |
+----+----------+
- to_period(self, freq=None, copy=True) -> 'Series'
- Convert Series from DatetimeIndex to PeriodIndex.
Parameters
----------
freq : str, default None
Frequency associated with the PeriodIndex.
copy : bool, default True
Whether or not to return a copy.
Returns
-------
Series
Series with index converted to PeriodIndex.
- to_string(self, buf=None, na_rep='NaN', float_format=None, header=True, index=True, length=False, dtype=False, name=False, max_rows=None, min_rows=None)
- Render a string representation of the Series.
Parameters
----------
buf : StringIO-like, optional
Buffer to write to.
na_rep : str, optional
String representation of NaN to use, default 'NaN'.
float_format : one-parameter function, optional
Formatter function to apply to columns' elements if they are
floats, default None.
header : bool, default True
Add the Series header (index name).
index : bool, optional
Add index (row) labels, default True.
length : bool, default False
Add the Series length.
dtype : bool, default False
Add the Series dtype.
name : bool, default False
Add the Series name if not None.
max_rows : int, optional
Maximum number of rows to show before truncating. If None, show
all.
min_rows : int, optional
The number of rows to display in a truncated repr (when number
of rows is above `max_rows`).
Returns
-------
str or None
String representation of Series if ``buf=None``, otherwise None.
- to_timestamp(self, freq=None, how='start', copy=True) -> 'Series'
- Cast to DatetimeIndex of Timestamps, at *beginning* of period.
Parameters
----------
freq : str, default frequency of PeriodIndex
Desired frequency.
how : {'s', 'e', 'start', 'end'}
Convention for converting period to timestamp; start of period
vs. end.
copy : bool, default True
Whether or not to return a copy.
Returns
-------
Series with DatetimeIndex
- transform(self, func: 'AggFuncType', axis: 'Axis' = 0, *args, **kwargs) -> 'DataFrame | Series'
- Call ``func`` on self producing a Series with the same axis shape as self.
Parameters
----------
func : function, str, list-like or dict-like
Function to use for transforming the data. If a function, must either
work when passed a Series or when passed to Series.apply. If func
is both list-like and dict-like, dict-like behavior takes precedence.
Accepted combinations are:
- function
- string function name
- list-like of functions and/or function names, e.g. ``[np.exp, 'sqrt']``
- dict-like of axis labels -> functions, function names or list-like of such.
axis : {0 or 'index'}
Parameter needed for compatibility with DataFrame.
*args
Positional arguments to pass to `func`.
**kwargs
Keyword arguments to pass to `func`.
Returns
-------
Series
A Series that must have the same length as self.
Raises
------
ValueError : If the returned Series has a different length than self.
See Also
--------
Series.agg : Only perform aggregating type operations.
Series.apply : Invoke function on a Series.
Notes
-----
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
Examples
--------
>>> df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df
A B
0 0 1
1 1 2
2 2 3
>>> df.transform(lambda x: x + 1)
A B
0 1 2
1 2 3
2 3 4
Even though the resulting Series must have the same length as the
input Series, it is possible to provide several input functions:
>>> s = pd.Series(range(3))
>>> s
0 0
1 1
2 2
dtype: int64
>>> s.transform([np.sqrt, np.exp])
sqrt exp
0 0.000000 1.000000
1 1.000000 2.718282
2 1.414214 7.389056
You can call transform on a GroupBy object:
>>> df = pd.DataFrame({
... "Date": [
... "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
... "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
... "Data": [5, 8, 6, 1, 50, 100, 60, 120],
... })
>>> df
Date Data
0 2015-05-08 5
1 2015-05-07 8
2 2015-05-06 6
3 2015-05-05 1
4 2015-05-08 50
5 2015-05-07 100
6 2015-05-06 60
7 2015-05-05 120
>>> df.groupby('Date')['Data'].transform('sum')
0 55
1 108
2 66
3 121
4 55
5 108
6 66
7 121
Name: Data, dtype: int64
>>> df = pd.DataFrame({
... "c": [1, 1, 1, 2, 2, 2, 2],
... "type": ["m", "n", "o", "m", "m", "n", "n"]
... })
>>> df
c type
0 1 m
1 1 n
2 1 o
3 2 m
4 2 m
5 2 n
6 2 n
>>> df['size'] = df.groupby('c')['type'].transform(len)
>>> df
c type size
0 1 m 3
1 1 n 3
2 1 o 3
3 2 m 4
4 2 m 4
5 2 n 4
6 2 n 4
- truediv(self, other, level=None, fill_value=None, axis=0)
- Return Floating division of series and other, element-wise (binary operator `truediv`).
Equivalent to ``series / other``, but with support to substitute a fill_value for
missing data in either one of the inputs.
Parameters
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result of filling (at that location) will be missing.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level.
Returns
-------
Series
The result of the operation.
See Also
--------
Series.rtruediv : Reverse of the Floating division operator, see
`Python documentation
<https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types>`_
for more details.
Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
>>> a.divide(b, fill_value=0)
a 1.0
b inf
c inf
d 0.0
e NaN
dtype: float64
- unique(self) -> 'ArrayLike'
- Return unique values of Series object.
Uniques are returned in order of appearance. Hash table-based unique,
therefore does NOT sort.
Returns
-------
ndarray or ExtensionArray
The unique values returned as a NumPy array. See Notes.
See Also
--------
unique : Top-level unique method for any 1-d array-like object.
Index.unique : Return Index with unique values from an Index object.
Notes
-----
Returns the unique values as a NumPy array. In case of an
extension-array backed Series, a new
:class:`~api.extensions.ExtensionArray` of that type with just
the unique values is returned. This includes
* Categorical
* Period
* Datetime with Timezone
* Interval
* Sparse
* IntegerNA
See Examples section.
Examples
--------
>>> pd.Series([2, 1, 3, 3], name='A').unique()
array([2, 1, 3])
>>> pd.Series([pd.Timestamp('2016-01-01') for _ in range(3)]).unique()
array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
>>> pd.Series([pd.Timestamp('2016-01-01', tz='US/Eastern')
... for _ in range(3)]).unique()
<DatetimeArray>
['2016-01-01 00:00:00-05:00']
Length: 1, dtype: datetime64[ns, US/Eastern]
An Categorical will return categories in the order of
appearance and with the same dtype.
>>> pd.Series(pd.Categorical(list('baabc'))).unique()
['b', 'a', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> pd.Series(pd.Categorical(list('baabc'), categories=list('abc'),
... ordered=True)).unique()
['b', 'a', 'c']
Categories (3, object): ['a' < 'b' < 'c']
- unstack(self, level=-1, fill_value=None) -> 'DataFrame'
- Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
Parameters
----------
level : int, str, or list of these, default last level
Level(s) to unstack, can pass level name.
fill_value : scalar value, default None
Value to use when replacing NaN values.
Returns
-------
DataFrame
Unstacked Series.
Notes
-----
Reference :ref:`the user guide <reshaping.stacking>` for more examples.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4],
... index=pd.MultiIndex.from_product([['one', 'two'],
... ['a', 'b']]))
>>> s
one a 1
b 2
two a 3
b 4
dtype: int64
>>> s.unstack(level=-1)
a b
one 1 2
two 3 4
>>> s.unstack(level=0)
one two
a 1 3
b 2 4
- update(self, other) -> 'None'
- Modify Series in place using values from passed Series.
Uses non-NA values from passed Series to make updates. Aligns
on index.
Parameters
----------
other : Series, or object coercible into Series
Examples
--------
>>> s = pd.Series([1, 2, 3])
>>> s.update(pd.Series([4, 5, 6]))
>>> s
0 4
1 5
2 6
dtype: int64
>>> s = pd.Series(['a', 'b', 'c'])
>>> s.update(pd.Series(['d', 'e'], index=[0, 2]))
>>> s
0 d
1 b
2 e
dtype: object
>>> s = pd.Series([1, 2, 3])
>>> s.update(pd.Series([4, 5, 6, 7, 8]))
>>> s
0 4
1 5
2 6
dtype: int64
If ``other`` contains NaNs the corresponding values are not updated
in the original Series.
>>> s = pd.Series([1, 2, 3])
>>> s.update(pd.Series([4, np.nan, 6]))
>>> s
0 4
1 2
2 6
dtype: int64
``other`` can also be a non-Series object type
that is coercible into a Series
>>> s = pd.Series([1, 2, 3])
>>> s.update([4, np.nan, 6])
>>> s
0 4
1 2
2 6
dtype: int64
>>> s = pd.Series([1, 2, 3])
>>> s.update({1: 9})
>>> s
0 1
1 9
2 3
dtype: int64
- var(self, axis=None, skipna=True, level=None, ddof=1, numeric_only=None, **kwargs)
- Return unbiased variance over requested axis.
Normalized by N-1 by default. This can be changed using the ddof argument.
Parameters
----------
axis : {index (0)}
skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar.
ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof,
where N represents the number of elements.
numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
Returns
-------
scalar or Series (if level specified)
Examples
--------
>>> df = pd.DataFrame({'person_id': [0, 1, 2, 3],
... 'age': [21, 25, 62, 43],
... 'height': [1.61, 1.87, 1.49, 2.01]}
... ).set_index('person_id')
>>> df
age height
person_id
0 21 1.61
1 25 1.87
2 62 1.49
3 43 2.01
>>> df.var()
age 352.916667
height 0.056367
Alternatively, ``ddof=0`` can be set to normalize by N instead of N-1:
>>> df.var(ddof=0)
age 264.687500
height 0.042275
- view(self, dtype: 'Dtype | None' = None) -> 'Series'
- Create a new view of the Series.
This function will return a new Series with a view of the same
underlying values in memory, optionally reinterpreted with a new data
type. The new data type must preserve the same size in bytes as to not
cause index misalignment.
Parameters
----------
dtype : data type
Data type object or one of their string representations.
Returns
-------
Series
A new Series object as a view of the same data in memory.
See Also
--------
numpy.ndarray.view : Equivalent numpy function to create a new view of
the same data in memory.
Notes
-----
Series are instantiated with ``dtype=float64`` by default. While
``numpy.ndarray.view()`` will return a view with the same data type as
the original array, ``Series.view()`` (without specified dtype)
will try using ``float64`` and may fail if the original data type size
in bytes is not the same.
Examples
--------
>>> s = pd.Series([-2, -1, 0, 1, 2], dtype='int8')
>>> s
0 -2
1 -1
2 0
3 1
4 2
dtype: int8
The 8 bit signed integer representation of `-1` is `0b11111111`, but
the same bytes represent 255 if read as an 8 bit unsigned integer:
>>> us = s.view('uint8')
>>> us
0 254
1 255
2 0
3 1
4 2
dtype: uint8
The views share the same underlying values:
>>> us[0] = 128
>>> s
0 -128
1 -1
2 0
3 1
4 2
dtype: int8
- where(self, cond, other=<no_default>, inplace=False, axis=None, level=None, errors=<no_default>, try_cast=<no_default>)
- Replace values where the condition is False.
Parameters
----------
cond : bool Series/DataFrame, array-like, or callable
Where `cond` is True, keep the original value. Where
False, replace with corresponding value from `other`.
If `cond` is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn't check it).
other : scalar, Series/DataFrame, or callable
Entries where `cond` is False are replaced with
corresponding value from `other`.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn't check it).
inplace : bool, default False
Whether to perform the operation in place on the data.
axis : int, default None
Alignment axis if needed.
level : int, default None
Alignment level if needed.
errors : str, {'raise', 'ignore'}, default 'raise'
Note that currently this parameter won't affect
the results and will always coerce to a suitable dtype.
- 'raise' : allow exceptions to be raised.
- 'ignore' : suppress exceptions. On error return original object.
try_cast : bool, default None
Try to cast the result back to the input type (if possible).
.. deprecated:: 1.3.0
Manually cast back if necessary.
Returns
-------
Same type as caller or None if ``inplace=True``.
See Also
--------
:func:`DataFrame.mask` : Return an object of same shape as
self.
Notes
-----
The where method is an application of the if-then idiom. For each
element in the calling DataFrame, if ``cond`` is ``True`` the
element is used; otherwise the corresponding element from the DataFrame
``other`` is used.
The signature for :func:`DataFrame.where` differs from
:func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
``np.where(m, df1, df2)``.
For further details and examples see the ``where`` documentation in
:ref:`indexing <indexing.where_mask>`.
Examples
--------
>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
>>> s.mask(s > 0)
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
>>> s.where(s > 1, 10)
0 10
1 10
2 2
3 3
4 4
dtype: int64
>>> s.mask(s > 1, 10)
0 0
1 1
2 10
3 10
4 10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> m = df % 3 == 0
>>> df.where(m, -df)
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == np.where(m, df, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
Readonly properties defined here:
- array
- The ExtensionArray of the data backing this Series or Index.
Returns
-------
ExtensionArray
An ExtensionArray of the values stored within. For extension
types, this is the actual array. For NumPy native types, this
is a thin (no copy) wrapper around :class:`numpy.ndarray`.
``.array`` differs ``.values`` which may require converting the
data to a different form.
See Also
--------
Index.to_numpy : Similar method that always returns a NumPy array.
Series.to_numpy : Similar method that always returns a NumPy array.
Notes
-----
This table lays out the different array types for each extension
dtype within pandas.
================== =============================
dtype array type
================== =============================
category Categorical
period PeriodArray
interval IntervalArray
IntegerNA IntegerArray
string StringArray
boolean BooleanArray
datetime64[ns, tz] DatetimeArray
================== =============================
For any 3rd-party extension types, the array type will be an
ExtensionArray.
For all remaining dtypes ``.array`` will be a
:class:`arrays.NumpyExtensionArray` wrapping the actual ndarray
stored within. If you absolutely need a NumPy array (possibly with
copying / coercing data), then use :meth:`Series.to_numpy` instead.
Examples
--------
For regular NumPy types like int, and float, a PandasArray
is returned.
>>> pd.Series([1, 2, 3]).array
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray
is returned
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
- axes
- Return a list of the row axis labels.
- dtype
- Return the dtype object of the underlying data.
- dtypes
- Return the dtype object of the underlying data.
- hasnans
- Return True if there are any NaNs.
Enables various performance speedups.
- values
- Return Series as ndarray or ndarray-like depending on the dtype.
.. warning::
We recommend using :attr:`Series.array` or
:meth:`Series.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
numpy.ndarray or ndarray-like
See Also
--------
Series.array : Reference to the underlying data.
Series.to_numpy : A NumPy array representing the underlying data.
Examples
--------
>>> pd.Series([1, 2, 3]).values
array([1, 2, 3])
>>> pd.Series(list('aabc')).values
array(['a', 'a', 'b', 'c'], dtype=object)
>>> pd.Series(list('aabc')).astype('category').values
['a', 'a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
Timezone aware datetime data is converted to UTC:
>>> pd.Series(pd.date_range('20130101', periods=3,
... tz='US/Eastern')).values
array(['2013-01-01T05:00:00.000000000',
'2013-01-02T05:00:00.000000000',
'2013-01-03T05:00:00.000000000'], dtype='datetime64[ns]')
Data descriptors defined here:
- index
- The index (axis labels) of the Series.
- name
- Return the name of the Series.
The name of a Series becomes its index or column name if it is used
to form a DataFrame. It is also used whenever displaying the Series
using the interpreter.
Returns
-------
label (hashable object)
The name of the Series, also the column name if part of a DataFrame.
See Also
--------
Series.rename : Sets the Series name when given a scalar input.
Index.name : Corresponding Index property.
Examples
--------
The Series name can be set initially when calling the constructor.
>>> s = pd.Series([1, 2, 3], dtype=np.int64, name='Numbers')
>>> s
0 1
1 2
2 3
Name: Numbers, dtype: int64
>>> s.name = "Integers"
>>> s
0 1
1 2
2 3
Name: Integers, dtype: int64
The name of a Series within a DataFrame is its column name.
>>> df = pd.DataFrame([[1, 2], [3, 4], [5, 6]],
... columns=["Odd Numbers", "Even Numbers"])
>>> df
Odd Numbers Even Numbers
0 1 2
1 3 4
2 5 6
>>> df["Even Numbers"].name
'Even Numbers'
Data and other attributes defined here:
- __annotations__ = {'_metadata': 'list[str]', '_mgr': 'SingleManager', '_name': 'Hashable', 'div': 'Callable[[Series, Any], Series]', 'index': 'Index', 'rdiv': 'Callable[[Series, Any], Series]'}
- cat = <class 'pandas.core.arrays.categorical.CategoricalAccessor'>
- Accessor object for categorical properties of the Series values.
Be aware that assigning to `categories` is a inplace operation, while all
methods return new categorical data per default (but can be called with
`inplace=True`).
Parameters
----------
data : Series or CategoricalIndex
Examples
--------
>>> s = pd.Series(list("abbccc")).astype("category")
>>> s
0 a
1 b
2 b
3 c
4 c
5 c
dtype: category
Categories (3, object): ['a', 'b', 'c']
>>> s.cat.categories
Index(['a', 'b', 'c'], dtype='object')
>>> s.cat.rename_categories(list("cba"))
0 c
1 b
2 b
3 a
4 a
5 a
dtype: category
Categories (3, object): ['c', 'b', 'a']
>>> s.cat.reorder_categories(list("cba"))
0 a
1 b
2 b
3 c
4 c
5 c
dtype: category
Categories (3, object): ['c', 'b', 'a']
>>> s.cat.add_categories(["d", "e"])
0 a
1 b
2 b
3 c
4 c
5 c
dtype: category
Categories (5, object): ['a', 'b', 'c', 'd', 'e']
>>> s.cat.remove_categories(["a", "c"])
0 NaN
1 b
2 b
3 NaN
4 NaN
5 NaN
dtype: category
Categories (1, object): ['b']
>>> s1 = s.cat.add_categories(["d", "e"])
>>> s1.cat.remove_unused_categories()
0 a
1 b
2 b
3 c
4 c
5 c
dtype: category
Categories (3, object): ['a', 'b', 'c']
>>> s.cat.set_categories(list("abcde"))
0 a
1 b
2 b
3 c
4 c
5 c
dtype: category
Categories (5, object): ['a', 'b', 'c', 'd', 'e']
>>> s.cat.as_ordered()
0 a
1 b
2 b
3 c
4 c
5 c
dtype: category
Categories (3, object): ['a' < 'b' < 'c']
>>> s.cat.as_unordered()
0 a
1 b
2 b
3 c
4 c
5 c
dtype: category
Categories (3, object): ['a', 'b', 'c']
- dt = <class 'pandas.core.indexes.accessors.CombinedDatetimelikeProperties'>
- plot = <class 'pandas.plotting._core.PlotAccessor'>
- Make plots of Series or DataFrame.
Uses the backend specified by the
option ``plotting.backend``. By default, matplotlib is used.
Parameters
----------
data : Series or DataFrame
The object for which the method is called.
x : label or position, default None
Only used if data is a DataFrame.
y : label, position or list of label, positions, default None
Allows plotting of one column versus another. Only used if data is a
DataFrame.
kind : str
The kind of plot to produce:
- 'line' : line plot (default)
- 'bar' : vertical bar plot
- 'barh' : horizontal bar plot
- 'hist' : histogram
- 'box' : boxplot
- 'kde' : Kernel Density Estimation plot
- 'density' : same as 'kde'
- 'area' : area plot
- 'pie' : pie plot
- 'scatter' : scatter plot (DataFrame only)
- 'hexbin' : hexbin plot (DataFrame only)
ax : matplotlib axes object, default None
An axes of the current figure.
subplots : bool, default False
Make separate subplots for each column.
sharex : bool, default True if ax is None else False
In case ``subplots=True``, share x axis and set some x axis labels
to invisible; defaults to True if ax is None otherwise False if
an ax is passed in; Be aware, that passing in both an ax and
``sharex=True`` will alter all x axis labels for all axis in a figure.
sharey : bool, default False
In case ``subplots=True``, share y axis and set some y axis labels to invisible.
layout : tuple, optional
(rows, columns) for the layout of subplots.
figsize : a tuple (width, height) in inches
Size of a figure object.
use_index : bool, default True
Use index as ticks for x axis.
title : str or list
Title to use for the plot. If a string is passed, print the string
at the top of the figure. If a list is passed and `subplots` is
True, print each item in the list above the corresponding subplot.
grid : bool, default None (matlab style default)
Axis grid lines.
legend : bool or {'reverse'}
Place legend on axis subplots.
style : list or dict
The matplotlib line style per column.
logx : bool or 'sym', default False
Use log scaling or symlog scaling on x axis.
.. versionchanged:: 0.25.0
logy : bool or 'sym' default False
Use log scaling or symlog scaling on y axis.
.. versionchanged:: 0.25.0
loglog : bool or 'sym', default False
Use log scaling or symlog scaling on both x and y axes.
.. versionchanged:: 0.25.0
xticks : sequence
Values to use for the xticks.
yticks : sequence
Values to use for the yticks.
xlim : 2-tuple/list
Set the x limits of the current axes.
ylim : 2-tuple/list
Set the y limits of the current axes.
xlabel : label, optional
Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the
x-column name for planar plots.
.. versionadded:: 1.1.0
.. versionchanged:: 1.2.0
Now applicable to planar plots (`scatter`, `hexbin`).
ylabel : label, optional
Name to use for the ylabel on y-axis. Default will show no ylabel, or the
y-column name for planar plots.
.. versionadded:: 1.1.0
.. versionchanged:: 1.2.0
Now applicable to planar plots (`scatter`, `hexbin`).
rot : int, default None
Rotation for ticks (xticks for vertical, yticks for horizontal
plots).
fontsize : int, default None
Font size for xticks and yticks.
colormap : str or matplotlib colormap object, default None
Colormap to select colors from. If string, load colormap with that
name from matplotlib.
colorbar : bool, optional
If True, plot colorbar (only relevant for 'scatter' and 'hexbin'
plots).
position : float
Specify relative alignments for bar plot layout.
From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
(center).
table : bool, Series or DataFrame, default False
If True, draw a table using the data in the DataFrame and the data
will be transposed to meet matplotlib's default layout.
If a Series or DataFrame is passed, use passed data to draw a
table.
yerr : DataFrame, Series, array-like, dict and str
See :ref:`Plotting with Error Bars <visualization.errorbars>` for
detail.
xerr : DataFrame, Series, array-like, dict and str
Equivalent to yerr.
stacked : bool, default False in line and bar plots, and True in area plot
If True, create stacked plot.
sort_columns : bool, default False
Sort column names to determine plot ordering.
secondary_y : bool or sequence, default False
Whether to plot on the secondary y-axis if a list/tuple, which
columns to plot on secondary y-axis.
mark_right : bool, default True
When using a secondary_y axis, automatically mark the column
labels with "(right)" in the legend.
include_bool : bool, default is False
If True, boolean values can be plotted.
backend : str, default None
Backend to use instead of the backend specified in the option
``plotting.backend``. For instance, 'matplotlib'. Alternatively, to
specify the ``plotting.backend`` for the whole session, set
``pd.options.plotting.backend``.
.. versionadded:: 1.0.0
**kwargs
Options to pass to matplotlib plotting method.
Returns
-------
:class:`matplotlib.axes.Axes` or numpy.ndarray of them
If the backend is not the default matplotlib one, the return value
will be the object returned by the backend.
Notes
-----
- See matplotlib documentation online for more on this subject
- If `kind` = 'bar' or 'barh', you can specify relative alignments
for bar plot layout by `position` keyword.
From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
(center)
- sparse = <class 'pandas.core.arrays.sparse.accessor.SparseAccessor'>
- Accessor for SparseSparse from other sparse matrix data types.
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- argmax(self, axis=None, skipna: 'bool' = True, *args, **kwargs) -> 'int'
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs) -> 'int'
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1)
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- to_list = tolist(self)
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- is_monotonic
- Return boolean if values in the object are
monotonic_increasing.
Returns
-------
bool
- is_monotonic_decreasing
- Return boolean if values in the object are
monotonic_decreasing.
Returns
-------
bool
- is_monotonic_increasing
- Alias for is_monotonic.
- is_unique
- Return boolean if values in the object are unique.
Returns
-------
bool
- nbytes
- Return the number of bytes in the underlying data.
- ndim
- Number of dimensions of the underlying data, by definition 1.
- shape
- Return a tuple of the shape of the underlying data.
- size
- Return the number of elements in the underlying data.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __and__(self, other)
- __divmod__(self, other)
- __eq__(self, other)
- Return self==value.
- __floordiv__(self, other)
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __or__(self, other)
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rdivmod__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
- __xor__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.generic.NDFrame:
- __abs__(self: 'NDFrameT') -> 'NDFrameT'
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str', *inputs: 'Any', **kwargs: 'Any')
- __array_wrap__(self, result: 'np.ndarray', context: 'tuple[Callable, tuple[Any, ...], int] | None' = None)
- Gets called after a ufunc and other functions.
Parameters
----------
result: np.ndarray
The result of the ufunc or other function called on the NumPy array
returned by __array__
context: tuple of (func, tuple, int)
This parameter is returned by ufuncs as a 3-element tuple: (name of the
ufunc, arguments of the ufunc, domain of the ufunc), but is not set by
other numpy functions.q
Notes
-----
Series implements __array_ufunc_ so this not called for ufunc on Series.
- __bool__ = __nonzero__(self)
- __contains__(self, key) -> 'bool_t'
- True if the key is in the info axis
- __copy__(self: 'NDFrameT', deep: 'bool_t' = True) -> 'NDFrameT'
- __deepcopy__(self: 'NDFrameT', memo=None) -> 'NDFrameT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __delitem__(self, key) -> 'None'
- Delete item
- __finalize__(self: 'NDFrameT', other, method: 'str | None' = None, **kwargs) -> 'NDFrameT'
- Propagate metadata from other to self.
Parameters
----------
other : the object from which to get the attributes that we are going
to propagate
method : str, optional
A passed method name providing context on where ``__finalize__``
was called.
.. warning::
The value passed as `method` are not currently considered
stable across pandas releases.
- __getattr__(self, name: 'str')
- After regular attribute access, try looking up the name
This allows simpler access to columns for interactive use.
- __getstate__(self) -> 'dict[str, Any]'
- __iadd__(self, other)
- __iand__(self, other)
- __ifloordiv__(self, other)
- __imod__(self, other)
- __imul__(self, other)
- __invert__(self)
- __ior__(self, other)
- __ipow__(self, other)
- __isub__(self, other)
- __itruediv__(self, other)
- __ixor__(self, other)
- __neg__(self)
- __nonzero__(self)
- __pos__(self)
- __round__(self: 'NDFrameT', decimals: 'int' = 0) -> 'NDFrameT'
- __setattr__(self, name: 'str', value) -> 'None'
- After regular attribute access, try setting the name
This allows simpler access to columns for interactive use.
- __setstate__(self, state)
- abs(self: 'NDFrameT') -> 'NDFrameT'
- Return a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
Returns
-------
abs
Series/DataFrame containing the absolute value of each element.
See Also
--------
numpy.absolute : Calculate the absolute value element-wise.
Notes
-----
For ``complex`` inputs, ``1.2 + 1j``, the absolute value is
:math:`\sqrt{ a^2 + b^2 }`.
Examples
--------
Absolute numeric values in a Series.
>>> s = pd.Series([-1.10, 2, -3.33, 4])
>>> s.abs()
0 1.10
1 2.00
2 3.33
3 4.00
dtype: float64
Absolute numeric values in a Series with complex numbers.
>>> s = pd.Series([1.2 + 1j])
>>> s.abs()
0 1.56205
dtype: float64
Absolute numeric values in a Series with a Timedelta element.
>>> s = pd.Series([pd.Timedelta('1 days')])
>>> s.abs()
0 1 days
dtype: timedelta64[ns]
Select rows with data closest to certain value using argsort (from
`StackOverflow <https://stackoverflow.com/a/17758115>`__).
>>> df = pd.DataFrame({
... 'a': [4, 5, 6, 7],
... 'b': [10, 20, 30, 40],
... 'c': [100, 50, -30, -50]
... })
>>> df
a b c
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
>>> df.loc[(df.c - 43).abs().argsort()]
a b c
1 5 20 50
0 4 10 100
2 6 30 -30
3 7 40 -50
- add_prefix(self: 'NDFrameT', prefix: 'str') -> 'NDFrameT'
- Prefix labels with string `prefix`.
For Series, the row labels are prefixed.
For DataFrame, the column labels are prefixed.
Parameters
----------
prefix : str
The string to add before each label.
Returns
-------
Series or DataFrame
New Series or DataFrame with updated labels.
See Also
--------
Series.add_suffix: Suffix row labels with string `suffix`.
DataFrame.add_suffix: Suffix column labels with string `suffix`.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4])
>>> s
0 1
1 2
2 3
3 4
dtype: int64
>>> s.add_prefix('item_')
item_0 1
item_1 2
item_2 3
item_3 4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
A B
0 1 3
1 2 4
2 3 5
3 4 6
>>> df.add_prefix('col_')
col_A col_B
0 1 3
1 2 4
2 3 5
3 4 6
- add_suffix(self: 'NDFrameT', suffix: 'str') -> 'NDFrameT'
- Suffix labels with string `suffix`.
For Series, the row labels are suffixed.
For DataFrame, the column labels are suffixed.
Parameters
----------
suffix : str
The string to add after each label.
Returns
-------
Series or DataFrame
New Series or DataFrame with updated labels.
See Also
--------
Series.add_prefix: Prefix row labels with string `prefix`.
DataFrame.add_prefix: Prefix column labels with string `prefix`.
Examples
--------
>>> s = pd.Series([1, 2, 3, 4])
>>> s
0 1
1 2
2 3
3 4
dtype: int64
>>> s.add_suffix('_item')
0_item 1
1_item 2
2_item 3
3_item 4
dtype: int64
>>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
A B
0 1 3
1 2 4
2 3 5
3 4 6
>>> df.add_suffix('_col')
A_col B_col
0 1 3
1 2 4
2 3 5
3 4 6
- asof(self, where, subset=None)
- Return the last row(s) without any NaNs before `where`.
The last row (for each element in `where`, if list) without any
NaN is taken.
In case of a :class:`~pandas.DataFrame`, the last row without NaN
considering only the subset of columns (if not `None`)
If there is no good value, NaN is returned for a Series or
a Series of NaN values for a DataFrame
Parameters
----------
where : date or array-like of dates
Date(s) before which the last row(s) are returned.
subset : str or array-like of str, default `None`
For DataFrame, if not `None`, only use these columns to
check for NaNs.
Returns
-------
scalar, Series, or DataFrame
The return can be:
* scalar : when `self` is a Series and `where` is a scalar
* Series: when `self` is a Series and `where` is an array-like,
or when `self` is a DataFrame and `where` is a scalar
* DataFrame : when `self` is a DataFrame and `where` is an
array-like
Return scalar, Series, or DataFrame.
See Also
--------
merge_asof : Perform an asof merge. Similar to left join.
Notes
-----
Dates are assumed to be sorted. Raises if this is not the case.
Examples
--------
A Series and a scalar `where`.
>>> s = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40])
>>> s
10 1.0
20 2.0
30 NaN
40 4.0
dtype: float64
>>> s.asof(20)
2.0
For a sequence `where`, a Series is returned. The first value is
NaN, because the first element of `where` is before the first
index value.
>>> s.asof([5, 20])
5 NaN
20 2.0
dtype: float64
Missing values are not considered. The following is ``2.0``, not
NaN, even though NaN is at the index location for ``30``.
>>> s.asof(30)
2.0
Take all columns into consideration
>>> df = pd.DataFrame({'a': [10, 20, 30, 40, 50],
... 'b': [None, None, None, None, 500]},
... index=pd.DatetimeIndex(['2018-02-27 09:01:00',
... '2018-02-27 09:02:00',
... '2018-02-27 09:03:00',
... '2018-02-27 09:04:00',
... '2018-02-27 09:05:00']))
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
... '2018-02-27 09:04:30']))
a b
2018-02-27 09:03:30 NaN NaN
2018-02-27 09:04:30 NaN NaN
Take a single column into consideration
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
... '2018-02-27 09:04:30']),
... subset=['a'])
a b
2018-02-27 09:03:30 30.0 NaN
2018-02-27 09:04:30 40.0 NaN
- astype(self: 'NDFrameT', dtype, copy: 'bool_t' = True, errors: 'str' = 'raise') -> 'NDFrameT'
- Cast a pandas object to a specified dtype ``dtype``.
Parameters
----------
dtype : data type, or dict of column name -> data type
Use a numpy.dtype or Python type to cast entire pandas object to
the same type. Alternatively, use {col: dtype, ...}, where col is a
column label and dtype is a numpy.dtype or Python type to cast one
or more of the DataFrame's columns to column-specific types.
copy : bool, default True
Return a copy when ``copy=True`` (be very careful setting
``copy=False`` as changes to values then may propagate to other
pandas objects).
errors : {'raise', 'ignore'}, default 'raise'
Control raising of exceptions on invalid data for provided dtype.
- ``raise`` : allow exceptions to be raised
- ``ignore`` : suppress exceptions. On error return original object.
Returns
-------
casted : same type as caller
See Also
--------
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
to_numeric : Convert argument to a numeric type.
numpy.ndarray.astype : Cast a numpy array to a specified type.
Notes
-----
.. deprecated:: 1.3.0
Using ``astype`` to convert from timezone-naive dtype to
timezone-aware dtype is deprecated and will raise in a
future version. Use :meth:`Series.dt.tz_localize` instead.
Examples
--------
Create a DataFrame:
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1 int64
col2 int64
dtype: object
Cast all columns to int32:
>>> df.astype('int32').dtypes
col1 int32
col2 int32
dtype: object
Cast col1 to int32 using a dictionary:
>>> df.astype({'col1': 'int32'}).dtypes
col1 int32
col2 int64
dtype: object
Create a series:
>>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0 1
1 2
dtype: int32
>>> ser.astype('int64')
0 1
1 2
dtype: int64
Convert to categorical type:
>>> ser.astype('category')
0 1
1 2
dtype: category
Categories (2, int64): [1, 2]
Convert to ordered categorical type with custom ordering:
>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(
... categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0 1
1 2
dtype: category
Categories (2, int64): [2 < 1]
Note that using ``copy=False`` and changing data on a new
pandas object may propagate changes:
>>> s1 = pd.Series([1, 2])
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1 # note that s1[0] has changed too
0 10
1 2
dtype: int64
Create a series of dates:
>>> ser_date = pd.Series(pd.date_range('20200101', periods=3))
>>> ser_date
0 2020-01-01
1 2020-01-02
2 2020-01-03
dtype: datetime64[ns]
- at_time(self: 'NDFrameT', time, asof: 'bool_t' = False, axis=None) -> 'NDFrameT'
- Select values at particular time of day (e.g., 9:30AM).
Parameters
----------
time : datetime.time or str
axis : {0 or 'index', 1 or 'columns'}, default 0
Returns
-------
Series or DataFrame
Raises
------
TypeError
If the index is not a :class:`DatetimeIndex`
See Also
--------
between_time : Select values between particular times of the day.
first : Select initial periods of time series based on a date offset.
last : Select final periods of time series based on a date offset.
DatetimeIndex.indexer_at_time : Get just the index locations for
values at particular time of the day.
Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='12H')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 00:00:00 1
2018-04-09 12:00:00 2
2018-04-10 00:00:00 3
2018-04-10 12:00:00 4
>>> ts.at_time('12:00')
A
2018-04-09 12:00:00 2
2018-04-10 12:00:00 4
- backfill = bfill(self: 'NDFrameT', axis: 'None | Axis' = None, inplace: 'bool_t' = False, limit: 'None | int' = None, downcast=None) -> 'NDFrameT | None'
- Synonym for :meth:`DataFrame.fillna` with ``method='bfill'``.
Returns
-------
Series/DataFrame or None
Object with missing values filled or None if ``inplace=True``.
- between_time(self: 'NDFrameT', start_time, end_time, include_start: 'bool_t | lib.NoDefault' = <no_default>, include_end: 'bool_t | lib.NoDefault' = <no_default>, inclusive: 'str | None' = None, axis=None) -> 'NDFrameT'
- Select values between particular times of the day (e.g., 9:00-9:30 AM).
By setting ``start_time`` to be later than ``end_time``,
you can get the times that are *not* between the two times.
Parameters
----------
start_time : datetime.time or str
Initial time as a time filter limit.
end_time : datetime.time or str
End time as a time filter limit.
include_start : bool, default True
Whether the start time needs to be included in the result.
.. deprecated:: 1.4.0
Arguments `include_start` and `include_end` have been deprecated
to standardize boundary inputs. Use `inclusive` instead, to set
each bound as closed or open.
include_end : bool, default True
Whether the end time needs to be included in the result.
.. deprecated:: 1.4.0
Arguments `include_start` and `include_end` have been deprecated
to standardize boundary inputs. Use `inclusive` instead, to set
each bound as closed or open.
inclusive : {"both", "neither", "left", "right"}, default "both"
Include boundaries; whether to set each bound as closed or open.
axis : {0 or 'index', 1 or 'columns'}, default 0
Determine range time on index or columns value.
Returns
-------
Series or DataFrame
Data from the original object filtered to the specified dates range.
Raises
------
TypeError
If the index is not a :class:`DatetimeIndex`
See Also
--------
at_time : Select values at a particular time of the day.
first : Select initial periods of time series based on a date offset.
last : Select final periods of time series based on a date offset.
DatetimeIndex.indexer_between_time : Get just the index locations for
values between particular times of the day.
Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 00:00:00 1
2018-04-10 00:20:00 2
2018-04-11 00:40:00 3
2018-04-12 01:00:00 4
>>> ts.between_time('0:15', '0:45')
A
2018-04-10 00:20:00 2
2018-04-11 00:40:00 3
You get the times that are *not* between two times by setting
``start_time`` later than ``end_time``:
>>> ts.between_time('0:45', '0:15')
A
2018-04-09 00:00:00 1
2018-04-12 01:00:00 4
- bool(self)
- Return the bool of a single element Series or DataFrame.
This must be a boolean scalar value, either True or False. It will raise a
ValueError if the Series or DataFrame does not have exactly 1 element, or that
element is not boolean (integer values 0 and 1 will also raise an exception).
Returns
-------
bool
The value in the Series or DataFrame.
See Also
--------
Series.astype : Change the data type of a Series, including to boolean.
DataFrame.astype : Change the data type of a DataFrame, including to boolean.
numpy.bool_ : NumPy boolean data type, used by pandas for boolean values.
Examples
--------
The method will only work for single element objects with a boolean value:
>>> pd.Series([True]).bool()
True
>>> pd.Series([False]).bool()
False
>>> pd.DataFrame({'col': [True]}).bool()
True
>>> pd.DataFrame({'col': [False]}).bool()
False
- convert_dtypes(self: 'NDFrameT', infer_objects: 'bool_t' = True, convert_string: 'bool_t' = True, convert_integer: 'bool_t' = True, convert_boolean: 'bool_t' = True, convert_floating: 'bool_t' = True) -> 'NDFrameT'
- Convert columns to best possible dtypes using dtypes supporting ``pd.NA``.
.. versionadded:: 1.0.0
Parameters
----------
infer_objects : bool, default True
Whether object dtypes should be converted to the best possible types.
convert_string : bool, default True
Whether object dtypes should be converted to ``StringDtype()``.
convert_integer : bool, default True
Whether, if possible, conversion can be done to integer extension types.
convert_boolean : bool, defaults True
Whether object dtypes should be converted to ``BooleanDtypes()``.
convert_floating : bool, defaults True
Whether, if possible, conversion can be done to floating extension types.
If `convert_integer` is also True, preference will be give to integer
dtypes if the floats can be faithfully casted to integers.
.. versionadded:: 1.2.0
Returns
-------
Series or DataFrame
Copy of input object with new dtype.
See Also
--------
infer_objects : Infer dtypes of objects.
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
to_numeric : Convert argument to a numeric type.
Notes
-----
By default, ``convert_dtypes`` will attempt to convert a Series (or each
Series in a DataFrame) to dtypes that support ``pd.NA``. By using the options
``convert_string``, ``convert_integer``, ``convert_boolean`` and
``convert_boolean``, it is possible to turn off individual conversions
to ``StringDtype``, the integer extension types, ``BooleanDtype``
or floating extension types, respectively.
For object-dtyped columns, if ``infer_objects`` is ``True``, use the inference
rules as during normal Series/DataFrame construction. Then, if possible,
convert to ``StringDtype``, ``BooleanDtype`` or an appropriate integer
or floating extension type, otherwise leave as ``object``.
If the dtype is integer, convert to an appropriate integer extension type.
If the dtype is numeric, and consists of all integers, convert to an
appropriate integer extension type. Otherwise, convert to an
appropriate floating extension type.
.. versionchanged:: 1.2
Starting with pandas 1.2, this method also converts float columns
to the nullable floating extension type.
In the future, as new dtypes are added that support ``pd.NA``, the results
of this method will change to support those new dtypes.
Examples
--------
>>> df = pd.DataFrame(
... {
... "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
... "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
... "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
... "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
... "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
... "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
... }
... )
Start with a DataFrame with default dtypes.
>>> df
a b c d e f
0 1 x True h 10.0 NaN
1 2 y False i NaN 100.5
2 3 z NaN NaN 20.0 200.0
>>> df.dtypes
a int32
b object
c object
d object
e float64
f float64
dtype: object
Convert the DataFrame to use best possible dtypes.
>>> dfn = df.convert_dtypes()
>>> dfn
a b c d e f
0 1 x True h 10 <NA>
1 2 y False i <NA> 100.5
2 3 z <NA> <NA> 20 200.0
>>> dfn.dtypes
a Int32
b string
c boolean
d string
e Int64
f Float64
dtype: object
Start with a Series of strings and missing data represented by ``np.nan``.
>>> s = pd.Series(["a", "b", np.nan])
>>> s
0 a
1 b
2 NaN
dtype: object
Obtain a Series with dtype ``StringDtype``.
>>> s.convert_dtypes()
0 a
1 b
2 <NA>
dtype: string
- copy(self: 'NDFrameT', deep: 'bool_t' = True) -> 'NDFrameT'
- Make a copy of this object's indices and data.
When ``deep=True`` (default), a new object will be created with a
copy of the calling object's data and indices. Modifications to
the data or indices of the copy will not be reflected in the
original object (see notes below).
When ``deep=False``, a new object will be created without copying
the calling object's data or index (only references to the data
and index are copied). Any changes to the data of the original
will be reflected in the shallow copy (and vice versa).
Parameters
----------
deep : bool, default True
Make a deep copy, including a copy of the data and the indices.
With ``deep=False`` neither the indices nor the data are copied.
Returns
-------
copy : Series or DataFrame
Object type matches caller.
Notes
-----
When ``deep=True``, data is copied but actual Python objects
will not be copied recursively, only the reference to the object.
This is in contrast to `copy.deepcopy` in the Standard Library,
which recursively copies object data (see examples below).
While ``Index`` objects are copied when ``deep=True``, the underlying
numpy array is not copied for performance reasons. Since ``Index`` is
immutable, the underlying data can be safely shared and a copy
is not needed.
Examples
--------
>>> s = pd.Series([1, 2], index=["a", "b"])
>>> s
a 1
b 2
dtype: int64
>>> s_copy = s.copy()
>>> s_copy
a 1
b 2
dtype: int64
**Shallow copy versus default (deep) copy:**
>>> s = pd.Series([1, 2], index=["a", "b"])
>>> deep = s.copy()
>>> shallow = s.copy(deep=False)
Shallow copy shares data and index with original.
>>> s is shallow
False
>>> s.values is shallow.values and s.index is shallow.index
True
Deep copy has own copy of data and index.
>>> s is deep
False
>>> s.values is deep.values or s.index is deep.index
False
Updates to the data shared by shallow copy and original is reflected
in both; deep copy remains unchanged.
>>> s[0] = 3
>>> shallow[1] = 4
>>> s
a 3
b 4
dtype: int64
>>> shallow
a 3
b 4
dtype: int64
>>> deep
a 1
b 2
dtype: int64
Note that when copying an object containing Python objects, a deep copy
will copy the data, but will not do so recursively. Updating a nested
data object will be reflected in the deep copy.
>>> s = pd.Series([[1, 2], [3, 4]])
>>> deep = s.copy()
>>> s[0][0] = 10
>>> s
0 [10, 2]
1 [3, 4]
dtype: object
>>> deep
0 [10, 2]
1 [3, 4]
dtype: object
- describe(self: 'NDFrameT', percentiles=None, include=None, exclude=None, datetime_is_numeric=False) -> 'NDFrameT'
- Generate descriptive statistics.
Descriptive statistics include those that summarize the central
tendency, dispersion and shape of a
dataset's distribution, excluding ``NaN`` values.
Analyzes both numeric and object series, as well
as ``DataFrame`` column sets of mixed data types. The output
will vary depending on what is provided. Refer to the notes
below for more detail.
Parameters
----------
percentiles : list-like of numbers, optional
The percentiles to include in the output. All should
fall between 0 and 1. The default is
``[.25, .5, .75]``, which returns the 25th, 50th, and
75th percentiles.
include : 'all', list-like of dtypes or None (default), optional
A white list of data types to include in the result. Ignored
for ``Series``. Here are the options:
- 'all' : All columns of the input will be included in the output.
- A list-like of dtypes : Limits the results to the
provided data types.
To limit the result to numeric types submit
``numpy.number``. To limit it instead to object columns submit
the ``numpy.object`` data type. Strings
can also be used in the style of
``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
select pandas categorical columns, use ``'category'``
- None (default) : The result will include all numeric columns.
exclude : list-like of dtypes or None (default), optional,
A black list of data types to omit from the result. Ignored
for ``Series``. Here are the options:
- A list-like of dtypes : Excludes the provided data types
from the result. To exclude numeric types submit
``numpy.number``. To exclude object columns submit the data
type ``numpy.object``. Strings can also be used in the style of
``select_dtypes`` (e.g. ``df.describe(exclude=['O'])``). To
exclude pandas categorical columns, use ``'category'``
- None (default) : The result will exclude nothing.
datetime_is_numeric : bool, default False
Whether to treat datetime dtypes as numeric. This affects statistics
calculated for the column. For DataFrame input, this also
controls whether datetime columns are included by default.
.. versionadded:: 1.1.0
Returns
-------
Series or DataFrame
Summary statistics of the Series or Dataframe provided.
See Also
--------
DataFrame.count: Count number of non-NA/null observations.
DataFrame.max: Maximum of the values in the object.
DataFrame.min: Minimum of the values in the object.
DataFrame.mean: Mean of the values.
DataFrame.std: Standard deviation of the observations.
DataFrame.select_dtypes: Subset of a DataFrame including/excluding
columns based on their dtype.
Notes
-----
For numeric data, the result's index will include ``count``,
``mean``, ``std``, ``min``, ``max`` as well as lower, ``50`` and
upper percentiles. By default the lower percentile is ``25`` and the
upper percentile is ``75``. The ``50`` percentile is the
same as the median.
For object data (e.g. strings or timestamps), the result's index
will include ``count``, ``unique``, ``top``, and ``freq``. The ``top``
is the most common value. The ``freq`` is the most common value's
frequency. Timestamps also include the ``first`` and ``last`` items.
If multiple object values have the highest count, then the
``count`` and ``top`` results will be arbitrarily chosen from
among those with the highest count.
For mixed data types provided via a ``DataFrame``, the default is to
return only an analysis of numeric columns. If the dataframe consists
only of object and categorical data without any numeric columns, the
default is to return an analysis of both the object and categorical
columns. If ``include='all'`` is provided as an option, the result
will include a union of attributes of each type.
The `include` and `exclude` parameters can be used to limit
which columns in a ``DataFrame`` are analyzed for the output.
The parameters are ignored when analyzing a ``Series``.
Examples
--------
Describing a numeric ``Series``.
>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
Describing a categorical ``Series``.
>>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count 4
unique 3
top a
freq 2
dtype: object
Describing a timestamp ``Series``.
>>> s = pd.Series([
... np.datetime64("2000-01-01"),
... np.datetime64("2010-01-01"),
... np.datetime64("2010-01-01")
... ])
>>> s.describe(datetime_is_numeric=True)
count 3
mean 2006-09-01 08:00:00
min 2000-01-01 00:00:00
25% 2004-12-31 12:00:00
50% 2010-01-01 00:00:00
75% 2010-01-01 00:00:00
max 2010-01-01 00:00:00
dtype: object
Describing a ``DataFrame``. By default only numeric fields
are returned.
>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
... 'numeric': [1, 2, 3],
... 'object': ['a', 'b', 'c']
... })
>>> df.describe()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Describing all columns of a ``DataFrame`` regardless of data type.
>>> df.describe(include='all') # doctest: +SKIP
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top f NaN a
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
min NaN 1.0 NaN
25% NaN 1.5 NaN
50% NaN 2.0 NaN
75% NaN 2.5 NaN
max NaN 3.0 NaN
Describing a column from a ``DataFrame`` by accessing it as
an attribute.
>>> df.numeric.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Name: numeric, dtype: float64
Including only numeric columns in a ``DataFrame`` description.
>>> df.describe(include=[np.number])
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Including only string columns in a ``DataFrame`` description.
>>> df.describe(include=[object]) # doctest: +SKIP
object
count 3
unique 3
top a
freq 1
Including only categorical columns from a ``DataFrame`` description.
>>> df.describe(include=['category'])
categorical
count 3
unique 3
top d
freq 1
Excluding numeric columns from a ``DataFrame`` description.
>>> df.describe(exclude=[np.number]) # doctest: +SKIP
categorical object
count 3 3
unique 3 3
top f a
freq 1 1
Excluding object columns from a ``DataFrame`` description.
>>> df.describe(exclude=[object]) # doctest: +SKIP
categorical numeric
count 3 3.0
unique 3 NaN
top f NaN
freq 1 NaN
mean NaN 2.0
std NaN 1.0
min NaN 1.0
25% NaN 1.5
50% NaN 2.0
75% NaN 2.5
max NaN 3.0
- droplevel(self: 'NDFrameT', level, axis=0) -> 'NDFrameT'
- Return Series/DataFrame with requested index / column level(s) removed.
Parameters
----------
level : int, str, or list-like
If a string is given, must be the name of a level
If list-like, elements must be names or positional indexes
of levels.
axis : {0 or 'index', 1 or 'columns'}, default 0
Axis along which the level(s) is removed:
* 0 or 'index': remove level(s) in column.
* 1 or 'columns': remove level(s) in row.
Returns
-------
Series/DataFrame
Series/DataFrame with requested index / column level(s) removed.
Examples
--------
>>> df = pd.DataFrame([
... [1, 2, 3, 4],
... [5, 6, 7, 8],
... [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])
>>> df.columns = pd.MultiIndex.from_tuples([
... ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])
>>> df
level_1 c d
level_2 e f
a b
1 2 3 4
5 6 7 8
9 10 11 12
>>> df.droplevel('a')
level_1 c d
level_2 e f
b
2 3 4
6 7 8
10 11 12
>>> df.droplevel('level_2', axis=1)
level_1 c d
a b
1 2 3 4
5 6 7 8
9 10 11 12
- equals(self, other: 'object') -> 'bool_t'
- Test whether two objects contain the same elements.
This function allows two Series or DataFrames to be compared against
each other to see if they have the same shape and elements. NaNs in
the same location are considered equal.
The row/column index do not need to have the same type, as long
as the values are considered equal. Corresponding columns must be of
the same dtype.
Parameters
----------
other : Series or DataFrame
The other Series or DataFrame to be compared with the first.
Returns
-------
bool
True if all elements are the same in both objects, False
otherwise.
See Also
--------
Series.eq : Compare two Series objects of the same length
and return a Series where each element is True if the element
in each Series is equal, False otherwise.
DataFrame.eq : Compare two DataFrame objects of the same shape and
return a DataFrame where each element is True if the respective
element in each DataFrame is equal, False otherwise.
testing.assert_series_equal : Raises an AssertionError if left and
right are not equal. Provides an easy interface to ignore
inequality in dtypes, indexes and precision among others.
testing.assert_frame_equal : Like assert_series_equal, but targets
DataFrames.
numpy.array_equal : Return True if two arrays have the same shape
and elements, False otherwise.
Examples
--------
>>> df = pd.DataFrame({1: [10], 2: [20]})
>>> df
1 2
0 10 20
DataFrames df and exactly_equal have the same types and values for
their elements and column labels, which will return True.
>>> exactly_equal = pd.DataFrame({1: [10], 2: [20]})
>>> exactly_equal
1 2
0 10 20
>>> df.equals(exactly_equal)
True
DataFrames df and different_column_type have the same element
types and values, but have different types for the column labels,
which will still return True.
>>> different_column_type = pd.DataFrame({1.0: [10], 2.0: [20]})
>>> different_column_type
1.0 2.0
0 10 20
>>> df.equals(different_column_type)
True
DataFrames df and different_data_type have different types for the
same values for their elements, and will return False even though
their column labels are the same values and types.
>>> different_data_type = pd.DataFrame({1: [10.0], 2: [20.0]})
>>> different_data_type
1 2
0 10.0 20.0
>>> df.equals(different_data_type)
False
- ewm(self, com: 'float | None' = None, span: 'float | None' = None, halflife: 'float | TimedeltaConvertibleTypes | None' = None, alpha: 'float | None' = None, min_periods: 'int | None' = 0, adjust: 'bool_t' = True, ignore_na: 'bool_t' = False, axis: 'Axis' = 0, times: 'str | np.ndarray | DataFrame | Series | None' = None, method: 'str' = 'single') -> 'ExponentialMovingWindow'
- Provide exponentially weighted (EW) calculations.
Exactly one parameter: ``com``, ``span``, ``halflife``, or ``alpha`` must be
provided.
Parameters
----------
com : float, optional
Specify decay in terms of center of mass
:math:`\alpha = 1 / (1 + com)`, for :math:`com \geq 0`.
span : float, optional
Specify decay in terms of span
:math:`\alpha = 2 / (span + 1)`, for :math:`span \geq 1`.
halflife : float, str, timedelta, optional
Specify decay in terms of half-life
:math:`\alpha = 1 - \exp\left(-\ln(2) / halflife\right)`, for
:math:`halflife > 0`.
If ``times`` is specified, the time unit (str or timedelta) over which an
observation decays to half its value. Only applicable to ``mean()``,
and halflife value will not apply to the other functions.
.. versionadded:: 1.1.0
alpha : float, optional
Specify smoothing factor :math:`\alpha` directly
:math:`0 < \alpha \leq 1`.
min_periods : int, default 0
Minimum number of observations in window required to have a value;
otherwise, result is ``np.nan``.
adjust : bool, default True
Divide by decaying adjustment factor in beginning periods to account
for imbalance in relative weightings (viewing EWMA as a moving average).
- When ``adjust=True`` (default), the EW function is calculated using weights
:math:`w_i = (1 - \alpha)^i`. For example, the EW moving average of the series
[:math:`x_0, x_1, ..., x_t`] would be:
.. math::
y_t = \frac{x_t + (1 - \alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ... + (1 -
\alpha)^t x_0}{1 + (1 - \alpha) + (1 - \alpha)^2 + ... + (1 - \alpha)^t}
- When ``adjust=False``, the exponentially weighted function is calculated
recursively:
.. math::
\begin{split}
y_0 &= x_0\\
y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,
\end{split}
ignore_na : bool, default False
Ignore missing values when calculating weights.
- When ``ignore_na=False`` (default), weights are based on absolute positions.
For example, the weights of :math:`x_0` and :math:`x_2` used in calculating
the final weighted average of [:math:`x_0`, None, :math:`x_2`] are
:math:`(1-\alpha)^2` and :math:`1` if ``adjust=True``, and
:math:`(1-\alpha)^2` and :math:`\alpha` if ``adjust=False``.
- When ``ignore_na=True``, weights are based
on relative positions. For example, the weights of :math:`x_0` and :math:`x_2`
used in calculating the final weighted average of
[:math:`x_0`, None, :math:`x_2`] are :math:`1-\alpha` and :math:`1` if
``adjust=True``, and :math:`1-\alpha` and :math:`\alpha` if ``adjust=False``.
axis : {0, 1}, default 0
If ``0`` or ``'index'``, calculate across the rows.
If ``1`` or ``'columns'``, calculate across the columns.
times : str, np.ndarray, Series, default None
.. versionadded:: 1.1.0
Only applicable to ``mean()``.
Times corresponding to the observations. Must be monotonically increasing and
``datetime64[ns]`` dtype.
If 1-D array like, a sequence with the same shape as the observations.
.. deprecated:: 1.4.0
If str, the name of the column in the DataFrame representing the times.
method : str {'single', 'table'}, default 'single'
.. versionadded:: 1.4.0
Execute the rolling operation per single column or row (``'single'``)
or over the entire object (``'table'``).
This argument is only implemented when specifying ``engine='numba'``
in the method call.
Only applicable to ``mean()``
Returns
-------
``ExponentialMovingWindow`` subclass
See Also
--------
rolling : Provides rolling window calculations.
expanding : Provides expanding transformations.
Notes
-----
See :ref:`Windowing Operations <window.exponentially_weighted>`
for further usage details and examples.
Examples
--------
>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
>>> df.ewm(com=0.5).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
>>> df.ewm(alpha=2 / 3).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
**adjust**
>>> df.ewm(com=0.5, adjust=True).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
>>> df.ewm(com=0.5, adjust=False).mean()
B
0 0.000000
1 0.666667
2 1.555556
3 1.555556
4 3.650794
**ignore_na**
>>> df.ewm(com=0.5, ignore_na=True).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.225000
>>> df.ewm(com=0.5, ignore_na=False).mean()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
**times**
Exponentially weighted mean with weights calculated with a timedelta ``halflife``
relative to ``times``.
>>> times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
>>> df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
B
0 0.000000
1 0.585786
2 1.523889
3 1.523889
4 3.233686
- expanding(self, min_periods: 'int' = 1, center: 'bool_t | None' = None, axis: 'Axis' = 0, method: 'str' = 'single') -> 'Expanding'
- Provide expanding window calculations.
Parameters
----------
min_periods : int, default 1
Minimum number of observations in window required to have a value;
otherwise, result is ``np.nan``.
center : bool, default False
If False, set the window labels as the right edge of the window index.
If True, set the window labels as the center of the window index.
.. deprecated:: 1.1.0
axis : int or str, default 0
If ``0`` or ``'index'``, roll across the rows.
If ``1`` or ``'columns'``, roll across the columns.
method : str {'single', 'table'}, default 'single'
Execute the rolling operation per single column or row (``'single'``)
or over the entire object (``'table'``).
This argument is only implemented when specifying ``engine='numba'``
in the method call.
.. versionadded:: 1.3.0
Returns
-------
``Expanding`` subclass
See Also
--------
rolling : Provides rolling window calculations.
ewm : Provides exponential weighted functions.
Notes
-----
See :ref:`Windowing Operations <window.expanding>` for further usage details
and examples.
Examples
--------
>>> df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})
>>> df
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
**min_periods**
Expanding sum with 1 vs 3 observations needed to calculate a value.
>>> df.expanding(1).sum()
B
0 0.0
1 1.0
2 3.0
3 3.0
4 7.0
>>> df.expanding(3).sum()
B
0 NaN
1 NaN
2 3.0
3 3.0
4 7.0
- filter(self: 'NDFrameT', items=None, like: 'str | None' = None, regex: 'str | None' = None, axis=None) -> 'NDFrameT'
- Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its
contents. The filter is applied to the labels of the index.
Parameters
----------
items : list-like
Keep labels from axis which are in items.
like : str
Keep labels from axis for which "like in label == True".
regex : str (regular expression)
Keep labels from axis for which re.search(regex, label) == True.
axis : {0 or ‘index’, 1 or ‘columns’, None}, default None
The axis to filter on, expressed either as an index (int)
or axis name (str). By default this is the info axis,
'index' for Series, 'columns' for DataFrame.
Returns
-------
same type as input object
See Also
--------
DataFrame.loc : Access a group of rows and columns
by label(s) or a boolean array.
Notes
-----
The ``items``, ``like``, and ``regex`` parameters are
enforced to be mutually exclusive.
``axis`` defaults to the info axis that is used when indexing
with ``[]``.
Examples
--------
>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
... index=['mouse', 'rabbit'],
... columns=['one', 'two', 'three'])
>>> df
one two three
mouse 1 2 3
rabbit 4 5 6
>>> # select columns by name
>>> df.filter(items=['one', 'three'])
one three
mouse 1 3
rabbit 4 6
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
one three
mouse 1 3
rabbit 4 6
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
one two three
rabbit 4 5 6
- first(self: 'NDFrameT', offset) -> 'NDFrameT'
- Select initial periods of time series data based on a date offset.
When having a DataFrame with dates as index, this function can
select the first few rows based on a date offset.
Parameters
----------
offset : str, DateOffset or dateutil.relativedelta
The offset length of the data that will be selected. For instance,
'1M' will display all the rows having their index within the first month.
Returns
-------
Series or DataFrame
A subset of the caller.
Raises
------
TypeError
If the index is not a :class:`DatetimeIndex`
See Also
--------
last : Select final periods of time series based on a date offset.
at_time : Select values at a particular time of the day.
between_time : Select values between particular times of the day.
Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 1
2018-04-11 2
2018-04-13 3
2018-04-15 4
Get the rows for the first 3 days:
>>> ts.first('3D')
A
2018-04-09 1
2018-04-11 2
Notice the data for 3 first calendar days were returned, not the first
3 days observed in the dataset, and therefore data for 2018-04-13 was
not returned.
- first_valid_index(self) -> 'Hashable | None'
- Return index for first non-NA value or None, if no non-NA value is found.
Returns
-------
scalar : type of index
Notes
-----
If all elements are non-NA/null, returns None.
Also returns None for empty Series/DataFrame.
- get(self, key, default=None)
- Get item from object for given key (ex: DataFrame column).
Returns default value if not found.
Parameters
----------
key : object
Returns
-------
value : same type as items contained in object
Examples
--------
>>> df = pd.DataFrame(
... [
... [24.3, 75.7, "high"],
... [31, 87.8, "high"],
... [22, 71.6, "medium"],
... [35, 95, "medium"],
... ],
... columns=["temp_celsius", "temp_fahrenheit", "windspeed"],
... index=pd.date_range(start="2014-02-12", end="2014-02-15", freq="D"),
... )
>>> df
temp_celsius temp_fahrenheit windspeed
2014-02-12 24.3 75.7 high
2014-02-13 31.0 87.8 high
2014-02-14 22.0 71.6 medium
2014-02-15 35.0 95.0 medium
>>> df.get(["temp_celsius", "windspeed"])
temp_celsius windspeed
2014-02-12 24.3 high
2014-02-13 31.0 high
2014-02-14 22.0 medium
2014-02-15 35.0 medium
If the key isn't found, the default value will be used.
>>> df.get(["temp_celsius", "temp_kelvin"], default="default_value")
'default_value'
- head(self: 'NDFrameT', n: 'int' = 5) -> 'NDFrameT'
- Return the first `n` rows.
This function returns the first `n` rows for the object based
on position. It is useful for quickly testing if your object
has the right type of data in it.
For negative values of `n`, this function returns all rows except
the last `n` rows, equivalent to ``df[:-n]``.
Parameters
----------
n : int, default 5
Number of rows to select.
Returns
-------
same type as caller
The first `n` rows of the caller object.
See Also
--------
DataFrame.tail: Returns the last `n` rows.
Examples
--------
>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
Viewing the first 5 lines
>>> df.head()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
Viewing the first `n` lines (three in this case)
>>> df.head(3)
animal
0 alligator
1 bee
2 falcon
For negative values of `n`
>>> df.head(-3)
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
- infer_objects(self: 'NDFrameT') -> 'NDFrameT'
- Attempt to infer better dtypes for object columns.
Attempts soft conversion of object-dtyped
columns, leaving non-object and unconvertible
columns unchanged. The inference rules are the
same as during normal Series/DataFrame construction.
Returns
-------
converted : same type as input object
See Also
--------
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
to_numeric : Convert argument to numeric type.
convert_dtypes : Convert argument to best possible dtype.
Examples
--------
>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
A
1 1
2 2
3 3
>>> df.dtypes
A object
dtype: object
>>> df.infer_objects().dtypes
A int64
dtype: object
- last(self: 'NDFrameT', offset) -> 'NDFrameT'
- Select final periods of time series data based on a date offset.
For a DataFrame with a sorted DatetimeIndex, this function
selects the last few rows based on a date offset.
Parameters
----------
offset : str, DateOffset, dateutil.relativedelta
The offset length of the data that will be selected. For instance,
'3D' will display all the rows having their index within the last 3 days.
Returns
-------
Series or DataFrame
A subset of the caller.
Raises
------
TypeError
If the index is not a :class:`DatetimeIndex`
See Also
--------
first : Select initial periods of time series based on a date offset.
at_time : Select values at a particular time of the day.
between_time : Select values between particular times of the day.
Examples
--------
>>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 1
2018-04-11 2
2018-04-13 3
2018-04-15 4
Get the rows for the last 3 days:
>>> ts.last('3D')
A
2018-04-13 3
2018-04-15 4
Notice the data for 3 last calendar days were returned, not the last
3 observed days in the dataset, and therefore data for 2018-04-11 was
not returned.
- last_valid_index(self) -> 'Hashable | None'
- Return index for last non-NA value or None, if no non-NA value is found.
Returns
-------
scalar : type of index
Notes
-----
If all elements are non-NA/null, returns None.
Also returns None for empty Series/DataFrame.
- pad = ffill(self: 'NDFrameT', axis: 'None | Axis' = None, inplace: 'bool_t' = False, limit: 'None | int' = None, downcast=None) -> 'NDFrameT | None'
- Synonym for :meth:`DataFrame.fillna` with ``method='ffill'``.
Returns
-------
Series/DataFrame or None
Object with missing values filled or None if ``inplace=True``.
- pct_change(self: 'NDFrameT', periods=1, fill_method='pad', limit=None, freq=None, **kwargs) -> 'NDFrameT'
- Percentage change between the current and a prior element.
Computes the percentage change from the immediately previous row by
default. This is useful in comparing the percentage of change in a time
series of elements.
Parameters
----------
periods : int, default 1
Periods to shift for forming percent change.
fill_method : str, default 'pad'
How to handle NAs before computing percent changes.
limit : int, default None
The number of consecutive NAs to fill before stopping.
freq : DateOffset, timedelta, or str, optional
Increment to use from time series API (e.g. 'M' or BDay()).
**kwargs
Additional keyword arguments are passed into
`DataFrame.shift` or `Series.shift`.
Returns
-------
chg : Series or DataFrame
The same type as the calling object.
See Also
--------
Series.diff : Compute the difference of two elements in a Series.
DataFrame.diff : Compute the difference of two elements in a DataFrame.
Series.shift : Shift the index by some number of periods.
DataFrame.shift : Shift the index by some number of periods.
Examples
--------
**Series**
>>> s = pd.Series([90, 91, 85])
>>> s
0 90
1 91
2 85
dtype: int64
>>> s.pct_change()
0 NaN
1 0.011111
2 -0.065934
dtype: float64
>>> s.pct_change(periods=2)
0 NaN
1 NaN
2 -0.055556
dtype: float64
See the percentage change in a Series where filling NAs with last
valid observation forward to next valid.
>>> s = pd.Series([90, 91, None, 85])
>>> s
0 90.0
1 91.0
2 NaN
3 85.0
dtype: float64
>>> s.pct_change(fill_method='ffill')
0 NaN
1 0.011111
2 0.000000
3 -0.065934
dtype: float64
**DataFrame**
Percentage change in French franc, Deutsche Mark, and Italian lira from
1980-01-01 to 1980-03-01.
>>> df = pd.DataFrame({
... 'FR': [4.0405, 4.0963, 4.3149],
... 'GR': [1.7246, 1.7482, 1.8519],
... 'IT': [804.74, 810.01, 860.13]},
... index=['1980-01-01', '1980-02-01', '1980-03-01'])
>>> df
FR GR IT
1980-01-01 4.0405 1.7246 804.74
1980-02-01 4.0963 1.7482 810.01
1980-03-01 4.3149 1.8519 860.13
>>> df.pct_change()
FR GR IT
1980-01-01 NaN NaN NaN
1980-02-01 0.013810 0.013684 0.006549
1980-03-01 0.053365 0.059318 0.061876
Percentage of change in GOOG and APPL stock volume. Shows computing
the percentage change between columns.
>>> df = pd.DataFrame({
... '2016': [1769950, 30586265],
... '2015': [1500923, 40912316],
... '2014': [1371819, 41403351]},
... index=['GOOG', 'APPL'])
>>> df
2016 2015 2014
GOOG 1769950 1500923 1371819
APPL 30586265 40912316 41403351
>>> df.pct_change(axis='columns', periods=-1)
2016 2015 2014
GOOG 0.179241 0.094112 NaN
APPL -0.252395 -0.011860 NaN
- pipe(self, func: 'Callable[..., T] | tuple[Callable[..., T], str]', *args, **kwargs) -> 'T'
- Apply chainable functions that expect Series or DataFrames.
Parameters
----------
func : function
Function to apply to the Series/DataFrame.
``args``, and ``kwargs`` are passed into ``func``.
Alternatively a ``(callable, data_keyword)`` tuple where
``data_keyword`` is a string indicating the keyword of
``callable`` that expects the Series/DataFrame.
args : iterable, optional
Positional arguments passed into ``func``.
kwargs : mapping, optional
A dictionary of keyword arguments passed into ``func``.
Returns
-------
object : the return type of ``func``.
See Also
--------
DataFrame.apply : Apply a function along input axis of DataFrame.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
Series.map : Apply a mapping correspondence on a
:class:`~pandas.Series`.
Notes
-----
Use ``.pipe`` when chaining together functions that expect
Series, DataFrames or GroupBy objects. Instead of writing
>>> func(g(h(df), arg1=a), arg2=b, arg3=c) # doctest: +SKIP
You can write
>>> (df.pipe(h)
... .pipe(g, arg1=a)
... .pipe(func, arg2=b, arg3=c)
... ) # doctest: +SKIP
If you have a function that takes the data as (say) the second
argument, pass a tuple indicating which keyword expects the
data. For example, suppose ``f`` takes its data as ``arg2``:
>>> (df.pipe(h)
... .pipe(g, arg1=a)
... .pipe((func, 'arg2'), arg1=a, arg3=c)
... ) # doctest: +SKIP
- rank(self: 'NDFrameT', axis=0, method: 'str' = 'average', numeric_only: 'bool_t | None | lib.NoDefault' = <no_default>, na_option: 'str' = 'keep', ascending: 'bool_t' = True, pct: 'bool_t' = False) -> 'NDFrameT'
- Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the
ranks of those values.
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
Index to direct ranking.
method : {'average', 'min', 'max', 'first', 'dense'}, default 'average'
How to rank the group of records that have the same value (i.e. ties):
* average: average rank of the group
* min: lowest rank in the group
* max: highest rank in the group
* first: ranks assigned in order they appear in the array
* dense: like 'min', but rank always increases by 1 between groups.
numeric_only : bool, optional
For DataFrame objects, rank only numeric columns if set to True.
na_option : {'keep', 'top', 'bottom'}, default 'keep'
How to rank NaN values:
* keep: assign NaN rank to NaN values
* top: assign lowest rank to NaN values
* bottom: assign highest rank to NaN values
ascending : bool, default True
Whether or not the elements should be ranked in ascending order.
pct : bool, default False
Whether or not to display the returned rankings in percentile
form.
Returns
-------
same type as caller
Return a Series or DataFrame with data ranks as values.
See Also
--------
core.groupby.GroupBy.rank : Rank of values within each group.
Examples
--------
>>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
... 'spider', 'snake'],
... 'Number_legs': [4, 2, 4, 8, np.nan]})
>>> df
Animal Number_legs
0 cat 4.0
1 penguin 2.0
2 dog 4.0
3 spider 8.0
4 snake NaN
The following example shows how the method behaves with the above
parameters:
* default_rank: this is the default behaviour obtained without using
any parameter.
* max_rank: setting ``method = 'max'`` the records that have the
same values are ranked using the highest rank (e.g.: since 'cat'
and 'dog' are both in the 2nd and 3rd position, rank 3 is assigned.)
* NA_bottom: choosing ``na_option = 'bottom'``, if there are records
with NaN values they are placed at the bottom of the ranking.
* pct_rank: when setting ``pct = True``, the ranking is expressed as
percentile rank.
>>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df
Animal Number_legs default_rank max_rank NA_bottom pct_rank
0 cat 4.0 2.5 3.0 2.5 0.625
1 penguin 2.0 1.0 1.0 1.0 0.250
2 dog 4.0 2.5 3.0 2.5 0.625
3 spider 8.0 4.0 4.0 4.0 1.000
4 snake NaN NaN NaN 5.0 NaN
- reindex_like(self: 'NDFrameT', other, method: 'str | None' = None, copy: 'bool_t' = True, limit=None, tolerance=None) -> 'NDFrameT'
- Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional
filling logic, placing NaN in locations having no value
in the previous index. A new object is produced unless the
new index is equivalent to the current one and copy=False.
Parameters
----------
other : Object of the same data type
Its row and column indices are used to define the new indices
of this object.
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* None (default): don't fill gaps
* pad / ffill: propagate last valid observation forward to next
valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap.
copy : bool, default True
Return a new object, even if the passed indexes are the same.
limit : int, default None
Maximum number of consecutive labels to fill for inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
Series or DataFrame
Same type as caller, but with changed indices on each axis.
See Also
--------
DataFrame.set_index : Set row labels.
DataFrame.reset_index : Remove row labels or move them to new columns.
DataFrame.reindex : Change to new indices or expand indices.
Notes
-----
Same as calling
``.reindex(index=other.index, columns=other.columns,...)``.
Examples
--------
>>> df1 = pd.DataFrame([[24.3, 75.7, 'high'],
... [31, 87.8, 'high'],
... [22, 71.6, 'medium'],
... [35, 95, 'medium']],
... columns=['temp_celsius', 'temp_fahrenheit',
... 'windspeed'],
... index=pd.date_range(start='2014-02-12',
... end='2014-02-15', freq='D'))
>>> df1
temp_celsius temp_fahrenheit windspeed
2014-02-12 24.3 75.7 high
2014-02-13 31.0 87.8 high
2014-02-14 22.0 71.6 medium
2014-02-15 35.0 95.0 medium
>>> df2 = pd.DataFrame([[28, 'low'],
... [30, 'low'],
... [35.1, 'medium']],
... columns=['temp_celsius', 'windspeed'],
... index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
... '2014-02-15']))
>>> df2
temp_celsius windspeed
2014-02-12 28.0 low
2014-02-13 30.0 low
2014-02-15 35.1 medium
>>> df2.reindex_like(df1)
temp_celsius temp_fahrenheit windspeed
2014-02-12 28.0 NaN low
2014-02-13 30.0 NaN low
2014-02-14 NaN NaN NaN
2014-02-15 35.1 NaN medium
- rename_axis(self, mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)
- Set the name of the axis for the index or columns.
Parameters
----------
mapper : scalar, list-like, optional
Value to set the axis name attribute.
index, columns : scalar, list-like, dict-like or function, optional
A scalar, list-like, dict-like or functions transformations to
apply to that axis' values.
Note that the ``columns`` parameter is not allowed if the
object is a Series. This parameter only apply for DataFrame
type objects.
Use either ``mapper`` and ``axis`` to
specify the axis to target with ``mapper``, or ``index``
and/or ``columns``.
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis to rename.
copy : bool, default True
Also copy underlying data.
inplace : bool, default False
Modifies the object directly, instead of creating a new Series
or DataFrame.
Returns
-------
Series, DataFrame, or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Series.rename : Alter Series index labels or name.
DataFrame.rename : Alter DataFrame index labels or name.
Index.rename : Set new names on index.
Notes
-----
``DataFrame.rename_axis`` supports two calling conventions
* ``(index=index_mapper, columns=columns_mapper, ...)``
* ``(mapper, axis={'index', 'columns'}, ...)``
The first calling convention will only modify the names of
the index and/or the names of the Index object that is the columns.
In this case, the parameter ``copy`` is ignored.
The second calling convention will modify the names of the
corresponding index if mapper is a list or a scalar.
However, if mapper is dict-like or a function, it will use the
deprecated behavior of modifying the axis *labels*.
We *highly* recommend using keyword arguments to clarify your
intent.
Examples
--------
**Series**
>>> s = pd.Series(["dog", "cat", "monkey"])
>>> s
0 dog
1 cat
2 monkey
dtype: object
>>> s.rename_axis("animal")
animal
0 dog
1 cat
2 monkey
dtype: object
**DataFrame**
>>> df = pd.DataFrame({"num_legs": [4, 4, 2],
... "num_arms": [0, 0, 2]},
... ["dog", "cat", "monkey"])
>>> df
num_legs num_arms
dog 4 0
cat 4 0
monkey 2 2
>>> df = df.rename_axis("animal")
>>> df
num_legs num_arms
animal
dog 4 0
cat 4 0
monkey 2 2
>>> df = df.rename_axis("limbs", axis="columns")
>>> df
limbs num_legs num_arms
animal
dog 4 0
cat 4 0
monkey 2 2
**MultiIndex**
>>> df.index = pd.MultiIndex.from_product([['mammal'],
... ['dog', 'cat', 'monkey']],
... names=['type', 'name'])
>>> df
limbs num_legs num_arms
type name
mammal dog 4 0
cat 4 0
monkey 2 2
>>> df.rename_axis(index={'type': 'class'})
limbs num_legs num_arms
class name
mammal dog 4 0
cat 4 0
monkey 2 2
>>> df.rename_axis(columns=str.upper)
LIMBS num_legs num_arms
type name
mammal dog 4 0
cat 4 0
monkey 2 2
- rolling(self, window: 'int | timedelta | BaseOffset | BaseIndexer', min_periods: 'int | None' = None, center: 'bool_t' = False, win_type: 'str | None' = None, on: 'str | None' = None, axis: 'Axis' = 0, closed: 'str | None' = None, method: 'str' = 'single')
- Provide rolling window calculations.
Parameters
----------
window : int, offset, or BaseIndexer subclass
Size of the moving window.
If an integer, the fixed number of observations used for
each window.
If an offset, the time period of each window. Each
window will be a variable sized based on the observations included in
the time-period. This is only valid for datetimelike indexes.
To learn more about the offsets & frequency strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__.
If a BaseIndexer subclass, the window boundaries
based on the defined ``get_window_bounds`` method. Additional rolling
keyword arguments, namely ``min_periods``, ``center``, and
``closed`` will be passed to ``get_window_bounds``.
min_periods : int, default None
Minimum number of observations in window required to have a value;
otherwise, result is ``np.nan``.
For a window that is specified by an offset, ``min_periods`` will default to 1.
For a window that is specified by an integer, ``min_periods`` will default
to the size of the window.
center : bool, default False
If False, set the window labels as the right edge of the window index.
If True, set the window labels as the center of the window index.
win_type : str, default None
If ``None``, all points are evenly weighted.
If a string, it must be a valid `scipy.signal window function
<https://docs.scipy.org/doc/scipy/reference/signal.windows.html#module-scipy.signal.windows>`__.
Certain Scipy window types require additional parameters to be passed
in the aggregation function. The additional parameters must match
the keywords specified in the Scipy window type method signature.
on : str, optional
For a DataFrame, a column label or Index level on which
to calculate the rolling window, rather than the DataFrame's index.
Provided integer column is ignored and excluded from result since
an integer index is not used to calculate the rolling window.
axis : int or str, default 0
If ``0`` or ``'index'``, roll across the rows.
If ``1`` or ``'columns'``, roll across the columns.
closed : str, default None
If ``'right'``, the first point in the window is excluded from calculations.
If ``'left'``, the last point in the window is excluded from calculations.
If ``'both'``, the no points in the window are excluded from calculations.
If ``'neither'``, the first and last points in the window are excluded
from calculations.
Default ``None`` (``'right'``).
.. versionchanged:: 1.2.0
The closed parameter with fixed windows is now supported.
method : str {'single', 'table'}, default 'single'
.. versionadded:: 1.3.0
Execute the rolling operation per single column or row (``'single'``)
or over the entire object (``'table'``).
This argument is only implemented when specifying ``engine='numba'``
in the method call.
Returns
-------
``Window`` subclass if a ``win_type`` is passed
``Rolling`` subclass if ``win_type`` is not passed
See Also
--------
expanding : Provides expanding transformations.
ewm : Provides exponential weighted functions.
Notes
-----
See :ref:`Windowing Operations <window.generic>` for further usage details
and examples.
Examples
--------
>>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
**window**
Rolling sum with a window length of 2 observations.
>>> df.rolling(2).sum()
B
0 NaN
1 1.0
2 3.0
3 NaN
4 NaN
Rolling sum with a window span of 2 seconds.
>>> df_time = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
... index = [pd.Timestamp('20130101 09:00:00'),
... pd.Timestamp('20130101 09:00:02'),
... pd.Timestamp('20130101 09:00:03'),
... pd.Timestamp('20130101 09:00:05'),
... pd.Timestamp('20130101 09:00:06')])
>>> df_time
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:03 2.0
2013-01-01 09:00:05 NaN
2013-01-01 09:00:06 4.0
>>> df_time.rolling('2s').sum()
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:03 3.0
2013-01-01 09:00:05 NaN
2013-01-01 09:00:06 4.0
Rolling sum with forward looking windows with 2 observations.
>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
>>> df.rolling(window=indexer, min_periods=1).sum()
B
0 1.0
1 3.0
2 2.0
3 4.0
4 4.0
**min_periods**
Rolling sum with a window length of 2 observations, but only needs a minimum of 1
observation to calculate a value.
>>> df.rolling(2, min_periods=1).sum()
B
0 0.0
1 1.0
2 3.0
3 2.0
4 4.0
**center**
Rolling sum with the result assigned to the center of the window index.
>>> df.rolling(3, min_periods=1, center=True).sum()
B
0 1.0
1 3.0
2 3.0
3 6.0
4 4.0
>>> df.rolling(3, min_periods=1, center=False).sum()
B
0 0.0
1 1.0
2 3.0
3 3.0
4 6.0
**win_type**
Rolling sum with a window length of 2, using the Scipy ``'gaussian'``
window type. ``std`` is required in the aggregation function.
>>> df.rolling(2, win_type='gaussian').sum(std=3)
B
0 NaN
1 0.986207
2 2.958621
3 NaN
4 NaN
- sample(self: 'NDFrameT', n: 'int | None' = None, frac: 'float | None' = None, replace: 'bool_t' = False, weights=None, random_state: 'RandomState | None' = None, axis: 'Axis | None' = None, ignore_index: 'bool_t' = False) -> 'NDFrameT'
- Return a random sample of items from an axis of object.
You can use `random_state` for reproducibility.
Parameters
----------
n : int, optional
Number of items from axis to return. Cannot be used with `frac`.
Default = 1 if `frac` = None.
frac : float, optional
Fraction of axis items to return. Cannot be used with `n`.
replace : bool, default False
Allow or disallow sampling of the same row more than once.
weights : str or ndarray-like, optional
Default 'None' results in equal probability weighting.
If passed a Series, will align with target object on index. Index
values in weights not found in sampled object will be ignored and
index values in sampled object not in weights will be assigned
weights of zero.
If called on a DataFrame, will accept the name of a column
when axis = 0.
Unless weights are a Series, weights must be same length as axis
being sampled.
If weights do not sum to 1, they will be normalized to sum to 1.
Missing values in the weights column will be treated as zero.
Infinite values not allowed.
random_state : int, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
If int, array-like, or BitGenerator, seed for random number generator.
If np.random.RandomState or np.random.Generator, use as given.
.. versionchanged:: 1.1.0
array-like and BitGenerator object now passed to np.random.RandomState()
as seed
.. versionchanged:: 1.4.0
np.random.Generator objects now accepted
axis : {0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis
for given data type (0 for Series and DataFrames).
ignore_index : bool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.3.0
Returns
-------
Series or DataFrame
A new object of same type as caller containing `n` items randomly
sampled from the caller object.
See Also
--------
DataFrameGroupBy.sample: Generates random samples from each group of a
DataFrame object.
SeriesGroupBy.sample: Generates random samples from each group of a
Series object.
numpy.random.choice: Generates a random sample from a given 1-D numpy
array.
Notes
-----
If `frac` > 1, `replacement` should be set to `True`.
Examples
--------
>>> df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
... 'num_wings': [2, 0, 0, 0],
... 'num_specimen_seen': [10, 2, 1, 8]},
... index=['falcon', 'dog', 'spider', 'fish'])
>>> df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
Extract 3 random elements from the ``Series`` ``df['num_legs']``:
Note that we use `random_state` to ensure the reproducibility of
the examples.
>>> df['num_legs'].sample(n=3, random_state=1)
fish 0
spider 8
falcon 2
Name: num_legs, dtype: int64
A random 50% sample of the ``DataFrame`` with replacement:
>>> df.sample(frac=0.5, replace=True, random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
An upsample sample of the ``DataFrame`` with replacement:
Note that `replace` parameter has to be `True` for `frac` parameter > 1.
>>> df.sample(frac=2, replace=True, random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
falcon 2 2 10
falcon 2 2 10
fish 0 0 8
dog 4 0 2
fish 0 0 8
dog 4 0 2
Using a DataFrame column as weights. Rows with larger value in the
`num_specimen_seen` column are more likely to be sampled.
>>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
num_legs num_wings num_specimen_seen
falcon 2 2 10
fish 0 0 8
- set_flags(self: 'NDFrameT', *, copy: 'bool_t' = False, allows_duplicate_labels: 'bool_t | None' = None) -> 'NDFrameT'
- Return a new object with updated flags.
Parameters
----------
allows_duplicate_labels : bool, optional
Whether the returned object allows duplicate labels.
Returns
-------
Series or DataFrame
The same type as the caller.
See Also
--------
DataFrame.attrs : Global metadata applying to this dataset.
DataFrame.flags : Global flags applying to this object.
Notes
-----
This method returns a new object that's a view on the same data
as the input. Mutating the input or the output values will be reflected
in the other.
This method is intended to be used in method chains.
"Flags" differ from "metadata". Flags reflect properties of the
pandas object (the Series or DataFrame). Metadata refer to properties
of the dataset, and should be stored in :attr:`DataFrame.attrs`.
Examples
--------
>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False
- slice_shift(self: 'NDFrameT', periods: 'int' = 1, axis=0) -> 'NDFrameT'
- Equivalent to `shift` without copying data.
The shifted data will not include the dropped periods and the
shifted axis will be smaller than the original.
.. deprecated:: 1.2.0
slice_shift is deprecated,
use DataFrame/Series.shift instead.
Parameters
----------
periods : int
Number of periods to move, can be positive or negative.
Returns
-------
shifted : same type as caller
Notes
-----
While the `slice_shift` is faster than `shift`, you may pay for it
later during alignment.
- squeeze(self, axis=None)
- Squeeze 1 dimensional axis objects into scalars.
Series or DataFrames with a single element are squeezed to a scalar.
DataFrames with a single column or a single row are squeezed to a
Series. Otherwise the object is unchanged.
This method is most useful when you don't know if your
object is a Series or DataFrame, but you do know it has just a single
column. In that case you can safely call `squeeze` to ensure you have a
Series.
Parameters
----------
axis : {0 or 'index', 1 or 'columns', None}, default None
A specific axis to squeeze. By default, all length-1 axes are
squeezed.
Returns
-------
DataFrame, Series, or scalar
The projection after squeezing `axis` or all the axes.
See Also
--------
Series.iloc : Integer-location based indexing for selecting scalars.
DataFrame.iloc : Integer-location based indexing for selecting Series.
Series.to_frame : Inverse of DataFrame.squeeze for a
single-column DataFrame.
Examples
--------
>>> primes = pd.Series([2, 3, 5, 7])
Slicing might produce a Series with a single value:
>>> even_primes = primes[primes % 2 == 0]
>>> even_primes
0 2
dtype: int64
>>> even_primes.squeeze()
2
Squeezing objects with more than one value in every axis does nothing:
>>> odd_primes = primes[primes % 2 == 1]
>>> odd_primes
1 3
2 5
3 7
dtype: int64
>>> odd_primes.squeeze()
1 3
2 5
3 7
dtype: int64
Squeezing is even more effective when used with DataFrames.
>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
>>> df
a b
0 1 2
1 3 4
Slicing a single column will produce a DataFrame with the columns
having only one value:
>>> df_a = df[['a']]
>>> df_a
a
0 1
1 3
So the columns can be squeezed down, resulting in a Series:
>>> df_a.squeeze('columns')
0 1
1 3
Name: a, dtype: int64
Slicing a single row from a single column will produce a single
scalar DataFrame:
>>> df_0a = df.loc[df.index < 1, ['a']]
>>> df_0a
a
0 1
Squeezing the rows produces a single scalar Series:
>>> df_0a.squeeze('rows')
a 1
Name: 0, dtype: int64
Squeezing all axes will project directly into a scalar:
>>> df_0a.squeeze()
1
- swapaxes(self: 'NDFrameT', axis1, axis2, copy=True) -> 'NDFrameT'
- Interchange axes and swap values axes appropriately.
Returns
-------
y : same as input
- tail(self: 'NDFrameT', n: 'int' = 5) -> 'NDFrameT'
- Return the last `n` rows.
This function returns last `n` rows from the object based on
position. It is useful for quickly verifying data, for example,
after sorting or appending rows.
For negative values of `n`, this function returns all rows except
the first `n` rows, equivalent to ``df[n:]``.
Parameters
----------
n : int, default 5
Number of rows to select.
Returns
-------
type of caller
The last `n` rows of the caller object.
See Also
--------
DataFrame.head : The first `n` rows of the caller object.
Examples
--------
>>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
Viewing the last 5 lines
>>> df.tail()
animal
4 monkey
5 parrot
6 shark
7 whale
8 zebra
Viewing the last `n` lines (three in this case)
>>> df.tail(3)
animal
6 shark
7 whale
8 zebra
For negative values of `n`
>>> df.tail(-3)
animal
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
- to_clipboard(self, excel: 'bool_t' = True, sep: 'str | None' = None, **kwargs) -> 'None'
- Copy object to the system clipboard.
Write a text representation of object to the system clipboard.
This can be pasted into Excel, for example.
Parameters
----------
excel : bool, default True
Produce output in a csv format for easy pasting into excel.
- True, use the provided separator for csv pasting.
- False, write a string representation of the object to the clipboard.
sep : str, default ``'\t'``
Field delimiter.
**kwargs
These parameters will be passed to DataFrame.to_csv.
See Also
--------
DataFrame.to_csv : Write a DataFrame to a comma-separated values
(csv) file.
read_clipboard : Read text from clipboard and pass to read_csv.
Notes
-----
Requirements for your platform.
- Linux : `xclip`, or `xsel` (with `PyQt4` modules)
- Windows : none
- macOS : none
Examples
--------
Copy the contents of a DataFrame to the clipboard.
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
>>> df.to_clipboard(sep=',') # doctest: +SKIP
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6
We can omit the index by passing the keyword `index` and setting
it to false.
>>> df.to_clipboard(sep=',', index=False) # doctest: +SKIP
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6
- to_csv(self, path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, sep: 'str' = ',', na_rep: 'str' = '', float_format: 'str | None' = None, columns: 'Sequence[Hashable] | None' = None, header: 'bool_t | list[str]' = True, index: 'bool_t' = True, index_label: 'IndexLabel | None' = None, mode: 'str' = 'w', encoding: 'str | None' = None, compression: 'CompressionOptions' = 'infer', quoting: 'int | None' = None, quotechar: 'str' = '"', line_terminator: 'str | None' = None, chunksize: 'int | None' = None, date_format: 'str | None' = None, doublequote: 'bool_t' = True, escapechar: 'str | None' = None, decimal: 'str' = '.', errors: 'str' = 'strict', storage_options: 'StorageOptions' = None) -> 'str | None'
- Write object to a comma-separated values (csv) file.
Parameters
----------
path_or_buf : str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like
object implementing a write() function. If None, the result is
returned as a string. If a non-binary file object is passed, it should
be opened with `newline=''`, disabling universal newlines. If a binary
file object is passed, `mode` might need to contain a `'b'`.
.. versionchanged:: 1.2.0
Support for binary file objects was introduced.
sep : str, default ','
String of length 1. Field delimiter for the output file.
na_rep : str, default ''
Missing data representation.
float_format : str, default None
Format string for floating point numbers.
columns : sequence, optional
Columns to write.
header : bool or list of str, default True
Write out the column names. If a list of strings is given it is
assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
index_label : str or sequence, or False, default None
Column label for index column(s) if desired. If None is given, and
`header` and `index` are True, then the index names are used. A
sequence should be given if the object uses MultiIndex. If
False do not print fields for index names. Use index_label=False
for easier importing in R.
mode : str
Python write mode, default 'w'.
encoding : str, optional
A string representing the encoding to use in the output file,
defaults to 'utf-8'. `encoding` is not supported if `path_or_buf`
is a non-binary file object.
compression : str or dict, default 'infer'
For on-the-fly compression of the output data. If 'infer' and '%s'
path-like, then detect compression from the following extensions: '.gz',
'.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). Set to
``None`` for no compression. Can also be a dict with key ``'method'`` set
to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``} and other
key-value pairs are forwarded to ``zipfile.ZipFile``, ``gzip.GzipFile``,
``bz2.BZ2File``, or ``zstandard.ZstdDecompressor``, respectively. As an
example, the following could be passed for faster compression and to create
a reproducible gzip archive:
``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.
.. versionchanged:: 1.0.0
May now be a dict with key 'method' as compression mode
and other entries as additional compression options if
compression mode is 'zip'.
.. versionchanged:: 1.1.0
Passing compression options as keys in dict is
supported for compression modes 'gzip', 'bz2', 'zstd', and 'zip'.
.. versionchanged:: 1.2.0
Compression is supported for binary file objects.
.. versionchanged:: 1.2.0
Previous versions forwarded dict entries for 'gzip' to
`gzip.open` instead of `gzip.GzipFile` which prevented
setting `mtime`.
quoting : optional constant from csv module
Defaults to csv.QUOTE_MINIMAL. If you have set a `float_format`
then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
will treat them as non-numeric.
quotechar : str, default '\"'
String of length 1. Character used to quote fields.
line_terminator : str, optional
The newline character or character sequence to use in the output
file. Defaults to `os.linesep`, which depends on the OS in which
this method is called ('\\n' for linux, '\\r\\n' for Windows, i.e.).
chunksize : int or None
Rows to write at a time.
date_format : str, default None
Format string for datetime objects.
doublequote : bool, default True
Control quoting of `quotechar` inside a field.
escapechar : str, default None
String of length 1. Character used to escape `sep` and `quotechar`
when appropriate.
decimal : str, default '.'
Character recognized as decimal separator. E.g. use ',' for
European data.
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
.. versionadded:: 1.1.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
Returns
-------
None or str
If path_or_buf is None, returns the resulting csv format as a
string. Otherwise returns None.
See Also
--------
read_csv : Load a CSV file into a DataFrame.
to_excel : Write DataFrame to an Excel file.
Examples
--------
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
... 'mask': ['red', 'purple'],
... 'weapon': ['sai', 'bo staff']})
>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'
Create 'out.zip' containing 'out.csv'
>>> compression_opts = dict(method='zip',
... archive_name='out.csv') # doctest: +SKIP
>>> df.to_csv('out.zip', index=False,
... compression=compression_opts) # doctest: +SKIP
To write a csv file to a new folder or nested folder you will first
need to create it using either Pathlib or os:
>>> from pathlib import Path # doctest: +SKIP
>>> filepath = Path('folder/subfolder/out.csv') # doctest: +SKIP
>>> filepath.parent.mkdir(parents=True, exist_ok=True) # doctest: +SKIP
>>> df.to_csv(filepath) # doctest: +SKIP
>>> import os # doctest: +SKIP
>>> os.makedirs('folder/subfolder', exist_ok=True) # doctest: +SKIP
>>> df.to_csv('folder/subfolder/out.csv') # doctest: +SKIP
- to_excel(self, excel_writer, sheet_name: 'str' = 'Sheet1', na_rep: 'str' = '', float_format: 'str | None' = None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options: 'StorageOptions' = None) -> 'None'
- Write object to an Excel sheet.
To write a single object to an Excel .xlsx file it is only necessary to
specify a target file name. To write to multiple sheets it is necessary to
create an `ExcelWriter` object with a target file name, and specify a sheet
in the file to write to.
Multiple sheets may be written to by specifying unique `sheet_name`.
With all data written to the file it is necessary to save the changes.
Note that creating an `ExcelWriter` object with a file name that already
exists will result in the contents of the existing file being erased.
Parameters
----------
excel_writer : path-like, file-like, or ExcelWriter object
File path or existing ExcelWriter.
sheet_name : str, default 'Sheet1'
Name of sheet which will contain DataFrame.
na_rep : str, default ''
Missing data representation.
float_format : str, optional
Format string for floating point numbers. For example
``float_format="%.2f"`` will format 0.1234 to 0.12.
columns : sequence or list of str, optional
Columns to write.
header : bool or list of str, default True
Write out the column names. If a list of string is given it is
assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
index_label : str or sequence, optional
Column label for index column(s) if desired. If not specified, and
`header` and `index` are True, then the index names are used. A
sequence should be given if the DataFrame uses MultiIndex.
startrow : int, default 0
Upper left cell row to dump data frame.
startcol : int, default 0
Upper left cell column to dump data frame.
engine : str, optional
Write engine to use, 'openpyxl' or 'xlsxwriter'. You can also set this
via the options ``io.excel.xlsx.writer``, ``io.excel.xls.writer``, and
``io.excel.xlsm.writer``.
.. deprecated:: 1.2.0
As the `xlwt <https://pypi.org/project/xlwt/>`__ package is no longer
maintained, the ``xlwt`` engine will be removed in a future version
of pandas.
merge_cells : bool, default True
Write MultiIndex and Hierarchical Rows as merged cells.
encoding : str, optional
Encoding of the resulting excel file. Only necessary for xlwt,
other writers support unicode natively.
inf_rep : str, default 'inf'
Representation for infinity (there is no native representation for
infinity in Excel).
verbose : bool, default True
Display more information in the error logs.
freeze_panes : tuple of int (length 2), optional
Specifies the one-based bottommost row and rightmost column that
is to be frozen.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
See Also
--------
to_csv : Write DataFrame to a comma-separated values (csv) file.
ExcelWriter : Class for writing DataFrame objects into excel sheets.
read_excel : Read an Excel file into a pandas DataFrame.
read_csv : Read a comma-separated values (csv) file into DataFrame.
Notes
-----
For compatibility with :meth:`~DataFrame.to_csv`,
to_excel serializes lists and dicts to strings before writing.
Once a workbook has been saved it is not possible to write further
data without rewriting the whole workbook.
Examples
--------
Create, write to and save a workbook:
>>> df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
... index=['row 1', 'row 2'],
... columns=['col 1', 'col 2'])
>>> df1.to_excel("output.xlsx") # doctest: +SKIP
To specify the sheet name:
>>> df1.to_excel("output.xlsx",
... sheet_name='Sheet_name_1') # doctest: +SKIP
If you wish to write to more than one sheet in the workbook, it is
necessary to specify an ExcelWriter object:
>>> df2 = df1.copy()
>>> with pd.ExcelWriter('output.xlsx') as writer: # doctest: +SKIP
... df1.to_excel(writer, sheet_name='Sheet_name_1')
... df2.to_excel(writer, sheet_name='Sheet_name_2')
ExcelWriter can also be used to append to an existing Excel file:
>>> with pd.ExcelWriter('output.xlsx',
... mode='a') as writer: # doctest: +SKIP
... df.to_excel(writer, sheet_name='Sheet_name_3')
To set the library that is used to write the Excel file,
you can pass the `engine` keyword (the default engine is
automatically chosen depending on the file extension):
>>> df1.to_excel('output1.xlsx', engine='xlsxwriter') # doctest: +SKIP
- to_hdf(self, path_or_buf, key: 'str', mode: 'str' = 'a', complevel: 'int | None' = None, complib: 'str | None' = None, append: 'bool_t' = False, format: 'str | None' = None, index: 'bool_t' = True, min_itemsize: 'int | dict[str, int] | None' = None, nan_rep=None, dropna: 'bool_t | None' = None, data_columns: 'Literal[True] | list[str] | None' = None, errors: 'str' = 'strict', encoding: 'str' = 'UTF-8') -> 'None'
- Write the contained data to an HDF5 file using HDFStore.
Hierarchical Data Format (HDF) is self-describing, allowing an
application to interpret the structure and contents of a file with
no outside information. One HDF file can hold a mix of related objects
which can be accessed as a group or as individual objects.
In order to add another DataFrame or Series to an existing HDF file
please use append mode and a different a key.
.. warning::
One can store a subclass of ``DataFrame`` or ``Series`` to HDF5,
but the type of the subclass is lost upon storing.
For more information see the :ref:`user guide <io.hdf5>`.
Parameters
----------
path_or_buf : str or pandas.HDFStore
File path or HDFStore object.
key : str
Identifier for the group in the store.
mode : {'a', 'w', 'r+'}, default 'a'
Mode to open file:
- 'w': write, a new file is created (an existing file with
the same name would be deleted).
- 'a': append, an existing file is opened for reading and
writing, and if the file does not exist it is created.
- 'r+': similar to 'a', but the file must already exist.
complevel : {0-9}, default None
Specifies a compression level for data.
A value of 0 or None disables compression.
complib : {'zlib', 'lzo', 'bzip2', 'blosc'}, default 'zlib'
Specifies the compression library to be used.
As of v0.20.2 these additional compressors for Blosc are supported
(default if no compressor specified: 'blosc:blosclz'):
{'blosc:blosclz', 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy',
'blosc:zlib', 'blosc:zstd'}.
Specifying a compression library which is not available issues
a ValueError.
append : bool, default False
For Table formats, append the input data to the existing.
format : {'fixed', 'table', None}, default 'fixed'
Possible values:
- 'fixed': Fixed format. Fast writing/reading. Not-appendable,
nor searchable.
- 'table': Table format. Write as a PyTables Table structure
which may perform worse but allow more flexible operations
like searching / selecting subsets of the data.
- If None, pd.get_option('io.hdf.default_format') is checked,
followed by fallback to "fixed".
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
encoding : str, default "UTF-8"
min_itemsize : dict or int, optional
Map column names to minimum string sizes for columns.
nan_rep : Any, optional
How to represent null values as str.
Not allowed with append=True.
data_columns : list of columns or True, optional
List of columns to create as indexed data columns for on-disk
queries, or True to use all columns. By default only the axes
of the object are indexed. See :ref:`io.hdf5-query-data-columns`.
Applicable only to format='table'.
See Also
--------
read_hdf : Read from HDF file.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
DataFrame.to_sql : Write to a SQL table.
DataFrame.to_feather : Write out feather-format for DataFrames.
DataFrame.to_csv : Write out to a csv file.
Examples
--------
>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
... index=['a', 'b', 'c']) # doctest: +SKIP
>>> df.to_hdf('data.h5', key='df', mode='w') # doctest: +SKIP
We can add another object to the same file:
>>> s = pd.Series([1, 2, 3, 4]) # doctest: +SKIP
>>> s.to_hdf('data.h5', key='s') # doctest: +SKIP
Reading from HDF file:
>>> pd.read_hdf('data.h5', 'df') # doctest: +SKIP
A B
a 1 4
b 2 5
c 3 6
>>> pd.read_hdf('data.h5', 's') # doctest: +SKIP
0 1
1 2
2 3
3 4
dtype: int64
- to_json(self, path_or_buf: 'FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None' = None, orient: 'str | None' = None, date_format: 'str | None' = None, double_precision: 'int' = 10, force_ascii: 'bool_t' = True, date_unit: 'str' = 'ms', default_handler: 'Callable[[Any], JSONSerializable] | None' = None, lines: 'bool_t' = False, compression: 'CompressionOptions' = 'infer', index: 'bool_t' = True, indent: 'int | None' = None, storage_options: 'StorageOptions' = None) -> 'str | None'
- Convert the object to a JSON string.
Note NaN's and None will be converted to null and datetime objects
will be converted to UNIX timestamps.
Parameters
----------
path_or_buf : str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like
object implementing a write() function. If None, the result is
returned as a string.
orient : str
Indication of expected JSON string format.
* Series:
- default is 'index'
- allowed values are: {'split', 'records', 'index', 'table'}.
* DataFrame:
- default is 'columns'
- allowed values are: {'split', 'records', 'index', 'columns',
'values', 'table'}.
* The format of the JSON string:
- 'split' : dict like {'index' -> [index], 'columns' -> [columns],
'data' -> [values]}
- 'records' : list like [{column -> value}, ... , {column -> value}]
- 'index' : dict like {index -> {column -> value}}
- 'columns' : dict like {column -> {index -> value}}
- 'values' : just the values array
- 'table' : dict like {'schema': {schema}, 'data': {data}}
Describing the data, where data component is like ``orient='records'``.
date_format : {None, 'epoch', 'iso'}
Type of date conversion. 'epoch' = epoch milliseconds,
'iso' = ISO8601. The default depends on the `orient`. For
``orient='table'``, the default is 'iso'. For all other orients,
the default is 'epoch'.
double_precision : int, default 10
The number of decimal places to use when encoding
floating point values.
force_ascii : bool, default True
Force encoded string to be ASCII.
date_unit : str, default 'ms' (milliseconds)
The time unit to encode to, governs timestamp and ISO8601
precision. One of 's', 'ms', 'us', 'ns' for second, millisecond,
microsecond, and nanosecond respectively.
default_handler : callable, default None
Handler to call if object cannot otherwise be converted to a
suitable format for JSON. Should receive a single argument which is
the object to convert and return a serialisable object.
lines : bool, default False
If 'orient' is 'records' write out line-delimited json format. Will
throw ValueError if incorrect 'orient' since others are not
list-like.
compression : str or dict, default 'infer'
For on-the-fly compression of the output data. If 'infer' and 'path_or_buf'
path-like, then detect compression from the following extensions: '.gz',
'.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). Set to
``None`` for no compression. Can also be a dict with key ``'method'`` set
to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``} and other
key-value pairs are forwarded to ``zipfile.ZipFile``, ``gzip.GzipFile``,
``bz2.BZ2File``, or ``zstandard.ZstdDecompressor``, respectively. As an
example, the following could be passed for faster compression and to create
a reproducible gzip archive:
``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.
.. versionchanged:: 1.4.0 Zstandard support.
index : bool, default True
Whether to include the index values in the JSON string. Not
including the index (``index=False``) is only supported when
orient is 'split' or 'table'.
indent : int, optional
Length of whitespace used to indent each record.
.. versionadded:: 1.0.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
Returns
-------
None or str
If path_or_buf is None, returns the resulting json format as a
string. Otherwise returns None.
See Also
--------
read_json : Convert a JSON string to pandas object.
Notes
-----
The behavior of ``indent=0`` varies from the stdlib, which does not
indent the output but does insert newlines. Currently, ``indent=0``
and the default ``indent=None`` are equivalent in pandas, though this
may change in a future release.
``orient='table'`` contains a 'pandas_version' field under 'schema'.
This stores the version of `pandas` used in the latest revision of the
schema.
Examples
--------
>>> import json
>>> df = pd.DataFrame(
... [["a", "b"], ["c", "d"]],
... index=["row 1", "row 2"],
... columns=["col 1", "col 2"],
... )
>>> result = df.to_json(orient="split")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"columns": [
"col 1",
"col 2"
],
"index": [
"row 1",
"row 2"
],
"data": [
[
"a",
"b"
],
[
"c",
"d"
]
]
}
Encoding/decoding a Dataframe using ``'records'`` formatted JSON.
Note that index labels are not preserved with this encoding.
>>> result = df.to_json(orient="records")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
[
{
"col 1": "a",
"col 2": "b"
},
{
"col 1": "c",
"col 2": "d"
}
]
Encoding/decoding a Dataframe using ``'index'`` formatted JSON:
>>> result = df.to_json(orient="index")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"row 1": {
"col 1": "a",
"col 2": "b"
},
"row 2": {
"col 1": "c",
"col 2": "d"
}
}
Encoding/decoding a Dataframe using ``'columns'`` formatted JSON:
>>> result = df.to_json(orient="columns")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"col 1": {
"row 1": "a",
"row 2": "c"
},
"col 2": {
"row 1": "b",
"row 2": "d"
}
}
Encoding/decoding a Dataframe using ``'values'`` formatted JSON:
>>> result = df.to_json(orient="values")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
[
[
"a",
"b"
],
[
"c",
"d"
]
]
Encoding with Table Schema:
>>> result = df.to_json(orient="table")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4) # doctest: +SKIP
{
"schema": {
"fields": [
{
"name": "index",
"type": "string"
},
{
"name": "col 1",
"type": "string"
},
{
"name": "col 2",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "1.4.0"
},
"data": [
{
"index": "row 1",
"col 1": "a",
"col 2": "b"
},
{
"index": "row 2",
"col 1": "c",
"col 2": "d"
}
]
}
- to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None)
- Render object to a LaTeX tabular, longtable, or nested table.
Requires ``\usepackage{booktabs}``. The output can be copy/pasted
into a main LaTeX document or read from an external file
with ``\input{table.tex}``.
.. versionchanged:: 1.0.0
Added caption and label arguments.
.. versionchanged:: 1.2.0
Added position argument, changed meaning of caption argument.
Parameters
----------
buf : str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns : list of label, optional
The subset of columns to write. Writes all columns by default.
col_space : int, optional
The minimum width of each column.
header : bool or list of str, default True
Write out the column names. If a list of strings is given,
it is assumed to be aliases for the column names.
index : bool, default True
Write row names (index).
na_rep : str, default 'NaN'
Missing data representation.
formatters : list of functions or dict of {str: function}, optional
Formatter functions to apply to columns' elements by position or
name. The result of each function must be a unicode string.
List must be of length equal to the number of columns.
float_format : one-parameter function or str, optional, default None
Formatter for floating point numbers. For example
``float_format="%.2f"`` and ``float_format="{:0.2f}".format`` will
both result in 0.1234 being formatted as 0.12.
sparsify : bool, optional
Set to False for a DataFrame with a hierarchical index to print
every multiindex key at each row. By default, the value will be
read from the config module.
index_names : bool, default True
Prints the names of the indexes.
bold_rows : bool, default False
Make the row labels bold in the output.
column_format : str, optional
The columns format as specified in `LaTeX table format
<https://en.wikibooks.org/wiki/LaTeX/Tables>`__ e.g. 'rcl' for 3
columns. By default, 'l' will be used for all columns except
columns of numbers, which default to 'r'.
longtable : bool, optional
By default, the value will be read from the pandas config
module. Use a longtable environment instead of tabular. Requires
adding a \usepackage{longtable} to your LaTeX preamble.
escape : bool, optional
By default, the value will be read from the pandas config
module. When set to False prevents from escaping latex special
characters in column names.
encoding : str, optional
A string representing the encoding to use in the output file,
defaults to 'utf-8'.
decimal : str, default '.'
Character recognized as decimal separator, e.g. ',' in Europe.
multicolumn : bool, default True
Use \multicolumn to enhance MultiIndex columns.
The default will be read from the config module.
multicolumn_format : str, default 'l'
The alignment for multicolumns, similar to `column_format`
The default will be read from the config module.
multirow : bool, default False
Use \multirow to enhance MultiIndex rows. Requires adding a
\usepackage{multirow} to your LaTeX preamble. Will print
centered labels (instead of top-aligned) across the contained
rows, separating groups via clines. The default will be read
from the pandas config module.
caption : str or tuple, optional
Tuple (full_caption, short_caption),
which results in ``\caption[short_caption]{full_caption}``;
if a single string is passed, no short caption will be set.
.. versionadded:: 1.0.0
.. versionchanged:: 1.2.0
Optionally allow caption to be a tuple ``(full_caption, short_caption)``.
label : str, optional
The LaTeX label to be placed inside ``\label{}`` in the output.
This is used with ``\ref{}`` in the main ``.tex`` file.
.. versionadded:: 1.0.0
position : str, optional
The LaTeX positional argument for tables, to be placed after
``\begin{}`` in the output.
.. versionadded:: 1.2.0
Returns
-------
str or None
If buf is None, returns the result as a string. Otherwise returns
None.
See Also
--------
Styler.to_latex : Render a DataFrame to LaTeX with conditional formatting.
DataFrame.to_string : Render a DataFrame to a console-friendly
tabular output.
DataFrame.to_html : Render a DataFrame as an HTML table.
Examples
--------
>>> df = pd.DataFrame(dict(name=['Raphael', 'Donatello'],
... mask=['red', 'purple'],
... weapon=['sai', 'bo staff']))
>>> print(df.to_latex(index=False)) # doctest: +SKIP
\begin{tabular}{lll}
\toprule
name & mask & weapon \\
\midrule
Raphael & red & sai \\
Donatello & purple & bo staff \\
\bottomrule
\end{tabular}
- to_pickle(self, path, compression: 'CompressionOptions' = 'infer', protocol: 'int' = 5, storage_options: 'StorageOptions' = None) -> 'None'
- Pickle (serialize) object to file.
Parameters
----------
path : str
File path where the pickled object will be stored.
compression : str or dict, default 'infer'
For on-the-fly compression of the output data. If 'infer' and 'path'
path-like, then detect compression from the following extensions: '.gz',
'.bz2', '.zip', '.xz', or '.zst' (otherwise no compression). Set to
``None`` for no compression. Can also be a dict with key ``'method'`` set
to one of {``'zip'``, ``'gzip'``, ``'bz2'``, ``'zstd'``} and other
key-value pairs are forwarded to ``zipfile.ZipFile``, ``gzip.GzipFile``,
``bz2.BZ2File``, or ``zstandard.ZstdDecompressor``, respectively. As an
example, the following could be passed for faster compression and to create
a reproducible gzip archive:
``compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}``.
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (see [1]_ paragraph 12.1.2). The possible
values are 0, 1, 2, 3, 4, 5. A negative value for the protocol
parameter is equivalent to setting its value to HIGHEST_PROTOCOL.
.. [1] https://docs.python.org/3/library/pickle.html.
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc. For HTTP(S) URLs the key-value pairs
are forwarded to ``urllib`` as header options. For other URLs (e.g.
starting with "s3://", and "gcs://") the key-value pairs are forwarded to
``fsspec``. Please see ``fsspec`` and ``urllib`` for more details.
.. versionadded:: 1.2.0
See Also
--------
read_pickle : Load pickled pandas object (or any object) from file.
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_sql : Write DataFrame to a SQL database.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
Examples
--------
>>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)}) # doctest: +SKIP
>>> original_df # doctest: +SKIP
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
>>> original_df.to_pickle("./dummy.pkl") # doctest: +SKIP
>>> unpickled_df = pd.read_pickle("./dummy.pkl") # doctest: +SKIP
>>> unpickled_df # doctest: +SKIP
foo bar
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
- to_sql(self, name: 'str', con, schema=None, if_exists: 'str' = 'fail', index: 'bool_t' = True, index_label=None, chunksize=None, dtype: 'DtypeArg | None' = None, method=None) -> 'int | None'
- Write records stored in a DataFrame to a SQL database.
Databases supported by SQLAlchemy [1]_ are supported. Tables can be
newly created, appended to, or overwritten.
Parameters
----------
name : str
Name of SQL table.
con : sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_.
schema : str, optional
Specify the schema (if database flavor supports this). If None, use
default schema.
if_exists : {'fail', 'replace', 'append'}, default 'fail'
How to behave if the table already exists.
* fail: Raise a ValueError.
* replace: Drop the table before inserting new values.
* append: Insert new values to the existing table.
index : bool, default True
Write DataFrame index as a column. Uses `index_label` as the column
name in the table.
index_label : str or sequence, default None
Column label for index column(s). If None is given (default) and
`index` is True, then the index names are used.
A sequence should be given if the DataFrame uses MultiIndex.
chunksize : int, optional
Specify the number of rows in each batch to be written at a time.
By default, all rows will be written at once.
dtype : dict or scalar, optional
Specifying the datatype for columns. If a dictionary is used, the
keys should be the column names and the values should be the
SQLAlchemy types or strings for the sqlite3 legacy mode. If a
scalar is provided, it will be applied to all columns.
method : {None, 'multi', callable}, optional
Controls the SQL insertion clause used:
* None : Uses standard SQL ``INSERT`` clause (one per row).
* 'multi': Pass multiple values in a single ``INSERT`` clause.
* callable with signature ``(pd_table, conn, keys, data_iter)``.
Details and a sample callable implementation can be found in the
section :ref:`insert method <io.sql.method>`.
Returns
-------
None or int
Number of rows affected by to_sql. None is returned if the callable
passed into ``method`` does not return the number of rows.
The number of returned rows affected is the sum of the ``rowcount``
attribute of ``sqlite3.Cursor`` or SQLAlchemy connectable which may not
reflect the exact number of written rows as stipulated in the
`sqlite3 <https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.rowcount>`__ or
`SQLAlchemy <https://docs.sqlalchemy.org/en/14/core/connections.html#sqlalchemy.engine.BaseCursorResult.rowcount>`__.
.. versionadded:: 1.4.0
Raises
------
ValueError
When the table already exists and `if_exists` is 'fail' (the
default).
See Also
--------
read_sql : Read a DataFrame from a table.
Notes
-----
Timezone aware datetime columns will be written as
``Timestamp with timezone`` type with SQLAlchemy if supported by the
database. Otherwise, the datetimes will be stored as timezone unaware
timestamps local to the original timezone.
References
----------
.. [1] https://docs.sqlalchemy.org
.. [2] https://www.python.org/dev/peps/pep-0249/
Examples
--------
Create an in-memory SQLite database.
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)
Create a table from scratch with 3 rows.
>>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> df
name
0 User 1
1 User 2
2 User 3
>>> df.to_sql('users', con=engine)
3
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]
An `sqlalchemy.engine.Connection` can also be passed to `con`:
>>> with engine.begin() as connection:
... df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
... df1.to_sql('users', con=connection, if_exists='append')
2
This is allowed to support operations that require that the same
DBAPI connection is used for the entire operation.
>>> df2 = pd.DataFrame({'name' : ['User 6', 'User 7']})
>>> df2.to_sql('users', con=engine, if_exists='append')
2
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3'),
(0, 'User 4'), (1, 'User 5'), (0, 'User 6'),
(1, 'User 7')]
Overwrite the table with just ``df2``.
>>> df2.to_sql('users', con=engine, if_exists='replace',
... index_label='id')
2
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 6'), (1, 'User 7')]
Specify the dtype (especially useful for integers with missing values).
Notice that while pandas is forced to store the data as floating point,
the database supports nullable integers. When fetching the data with
Python, we get back integer scalars.
>>> df = pd.DataFrame({"A": [1, None, 2]})
>>> df
A
0 1.0
1 NaN
2 2.0
>>> from sqlalchemy.types import Integer
>>> df.to_sql('integers', con=engine, index=False,
... dtype={"A": Integer()})
3
>>> engine.execute("SELECT * FROM integers").fetchall()
[(1,), (None,), (2,)]
- to_xarray(self)
- Return an xarray object from the pandas object.
Returns
-------
xarray.DataArray or xarray.Dataset
Data in the pandas structure converted to Dataset if the object is
a DataFrame, or a DataArray if the object is a Series.
See Also
--------
DataFrame.to_hdf : Write DataFrame to an HDF5 file.
DataFrame.to_parquet : Write a DataFrame to the binary parquet format.
Notes
-----
See the `xarray docs <https://xarray.pydata.org/en/stable/>`__
Examples
--------
>>> df = pd.DataFrame([('falcon', 'bird', 389.0, 2),
... ('parrot', 'bird', 24.0, 2),
... ('lion', 'mammal', 80.5, 4),
... ('monkey', 'mammal', np.nan, 4)],
... columns=['name', 'class', 'max_speed',
... 'num_legs'])
>>> df
name class max_speed num_legs
0 falcon bird 389.0 2
1 parrot bird 24.0 2
2 lion mammal 80.5 4
3 monkey mammal NaN 4
>>> df.to_xarray()
<xarray.Dataset>
Dimensions: (index: 4)
Coordinates:
* index (index) int64 0 1 2 3
Data variables:
name (index) object 'falcon' 'parrot' 'lion' 'monkey'
class (index) object 'bird' 'bird' 'mammal' 'mammal'
max_speed (index) float64 389.0 24.0 80.5 nan
num_legs (index) int64 2 2 4 4
>>> df['max_speed'].to_xarray()
<xarray.DataArray 'max_speed' (index: 4)>
array([389. , 24. , 80.5, nan])
Coordinates:
* index (index) int64 0 1 2 3
>>> dates = pd.to_datetime(['2018-01-01', '2018-01-01',
... '2018-01-02', '2018-01-02'])
>>> df_multiindex = pd.DataFrame({'date': dates,
... 'animal': ['falcon', 'parrot',
... 'falcon', 'parrot'],
... 'speed': [350, 18, 361, 15]})
>>> df_multiindex = df_multiindex.set_index(['date', 'animal'])
>>> df_multiindex
speed
date animal
2018-01-01 falcon 350
parrot 18
2018-01-02 falcon 361
parrot 15
>>> df_multiindex.to_xarray()
<xarray.Dataset>
Dimensions: (animal: 2, date: 2)
Coordinates:
* date (date) datetime64[ns] 2018-01-01 2018-01-02
* animal (animal) object 'falcon' 'parrot'
Data variables:
speed (date, animal) int64 350 18 361 15
- truncate(self: 'NDFrameT', before=None, after=None, axis=None, copy: 'bool_t' = True) -> 'NDFrameT'
- Truncate a Series or DataFrame before and after some index value.
This is a useful shorthand for boolean indexing based on index
values above or below certain thresholds.
Parameters
----------
before : date, str, int
Truncate all rows before this index value.
after : date, str, int
Truncate all rows after this index value.
axis : {0 or 'index', 1 or 'columns'}, optional
Axis to truncate. Truncates the index (rows) by default.
copy : bool, default is True,
Return a copy of the truncated section.
Returns
-------
type of caller
The truncated Series or DataFrame.
See Also
--------
DataFrame.loc : Select a subset of a DataFrame by label.
DataFrame.iloc : Select a subset of a DataFrame by position.
Notes
-----
If the index being truncated contains only datetime values,
`before` and `after` may be specified as strings instead of
Timestamps.
Examples
--------
>>> df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
... 'B': ['f', 'g', 'h', 'i', 'j'],
... 'C': ['k', 'l', 'm', 'n', 'o']},
... index=[1, 2, 3, 4, 5])
>>> df
A B C
1 a f k
2 b g l
3 c h m
4 d i n
5 e j o
>>> df.truncate(before=2, after=4)
A B C
2 b g l
3 c h m
4 d i n
The columns of a DataFrame can be truncated.
>>> df.truncate(before="A", after="B", axis="columns")
A B
1 a f
2 b g
3 c h
4 d i
5 e j
For Series, only rows can be truncated.
>>> df['A'].truncate(before=2, after=4)
2 b
3 c
4 d
Name: A, dtype: object
The index values in ``truncate`` can be datetimes or string
dates.
>>> dates = pd.date_range('2016-01-01', '2016-02-01', freq='s')
>>> df = pd.DataFrame(index=dates, data={'A': 1})
>>> df.tail()
A
2016-01-31 23:59:56 1
2016-01-31 23:59:57 1
2016-01-31 23:59:58 1
2016-01-31 23:59:59 1
2016-02-01 00:00:00 1
>>> df.truncate(before=pd.Timestamp('2016-01-05'),
... after=pd.Timestamp('2016-01-10')).tail()
A
2016-01-09 23:59:56 1
2016-01-09 23:59:57 1
2016-01-09 23:59:58 1
2016-01-09 23:59:59 1
2016-01-10 00:00:00 1
Because the index is a DatetimeIndex containing only dates, we can
specify `before` and `after` as strings. They will be coerced to
Timestamps before truncation.
>>> df.truncate('2016-01-05', '2016-01-10').tail()
A
2016-01-09 23:59:56 1
2016-01-09 23:59:57 1
2016-01-09 23:59:58 1
2016-01-09 23:59:59 1
2016-01-10 00:00:00 1
Note that ``truncate`` assumes a 0 value for any unspecified time
component (midnight). This differs from partial string slicing, which
returns any partially matching dates.
>>> df.loc['2016-01-05':'2016-01-10', :].tail()
A
2016-01-10 23:59:55 1
2016-01-10 23:59:56 1
2016-01-10 23:59:57 1
2016-01-10 23:59:58 1
2016-01-10 23:59:59 1
- tshift(self: 'NDFrameT', periods: 'int' = 1, freq=None, axis: 'Axis' = 0) -> 'NDFrameT'
- Shift the time index, using the index's frequency if available.
.. deprecated:: 1.1.0
Use `shift` instead.
Parameters
----------
periods : int
Number of periods to move, can be positive or negative.
freq : DateOffset, timedelta, or str, default None
Increment to use from the tseries module
or time rule expressed as a string (e.g. 'EOM').
axis : {0 or ‘index’, 1 or ‘columns’, None}, default 0
Corresponds to the axis that contains the Index.
Returns
-------
shifted : Series/DataFrame
Notes
-----
If freq is not specified then tries to use the freq or inferred_freq
attributes of the index. If neither of those attributes exist, a
ValueError is thrown
- tz_convert(self: 'NDFrameT', tz, axis=0, level=None, copy: 'bool_t' = True) -> 'NDFrameT'
- Convert tz-aware axis to target time zone.
Parameters
----------
tz : str or tzinfo object
axis : the axis to convert
level : int, str, default None
If axis is a MultiIndex, convert a specific level. Otherwise
must be None.
copy : bool, default True
Also make a copy of the underlying data.
Returns
-------
{klass}
Object with time zone converted axis.
Raises
------
TypeError
If the axis is tz-naive.
- tz_localize(self: 'NDFrameT', tz, axis=0, level=None, copy: 'bool_t' = True, ambiguous='raise', nonexistent: 'str' = 'raise') -> 'NDFrameT'
- Localize tz-naive index of a Series or DataFrame to target time zone.
This operation localizes the Index. To localize the values in a
timezone-naive Series, use :meth:`Series.dt.tz_localize`.
Parameters
----------
tz : str or tzinfo
axis : the axis to localize
level : int, str, default None
If axis ia a MultiIndex, localize a specific level. Otherwise
must be None.
copy : bool, default True
Also make a copy of the underlying data.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
`ambiguous` parameter dictates how ambiguous times should be
handled.
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : str, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST. Valid values are:
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
Series or DataFrame
Same type as the input.
Raises
------
TypeError
If the TimeSeries is tz-aware and tz is not None.
Examples
--------
Localize local times:
>>> s = pd.Series([1],
... index=pd.DatetimeIndex(['2018-09-15 01:30:00']))
>>> s.tz_localize('CET')
2018-09-15 01:30:00+02:00 1
dtype: int64
Be careful with DST changes. When there is sequential data, pandas
can infer the DST time:
>>> s = pd.Series(range(7),
... index=pd.DatetimeIndex(['2018-10-28 01:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 03:00:00',
... '2018-10-28 03:30:00']))
>>> s.tz_localize('CET', ambiguous='infer')
2018-10-28 01:30:00+02:00 0
2018-10-28 02:00:00+02:00 1
2018-10-28 02:30:00+02:00 2
2018-10-28 02:00:00+01:00 3
2018-10-28 02:30:00+01:00 4
2018-10-28 03:00:00+01:00 5
2018-10-28 03:30:00+01:00 6
dtype: int64
In some cases, inferring the DST is impossible. In such cases, you can
pass an ndarray to the ambiguous parameter to set the DST explicitly
>>> s = pd.Series(range(3),
... index=pd.DatetimeIndex(['2018-10-28 01:20:00',
... '2018-10-28 02:36:00',
... '2018-10-28 03:46:00']))
>>> s.tz_localize('CET', ambiguous=np.array([True, True, False]))
2018-10-28 01:20:00+02:00 0
2018-10-28 02:36:00+02:00 1
2018-10-28 03:46:00+01:00 2
dtype: int64
If the DST transition causes nonexistent times, you can shift these
dates forward or backward with a timedelta object or `'shift_forward'`
or `'shift_backward'`.
>>> s = pd.Series(range(2),
... index=pd.DatetimeIndex(['2015-03-29 02:30:00',
... '2015-03-29 03:30:00']))
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
2015-03-29 03:00:00+02:00 0
2015-03-29 03:30:00+02:00 1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
2015-03-29 01:59:59.999999999+01:00 0
2015-03-29 03:30:00+02:00 1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1H'))
2015-03-29 03:30:00+02:00 0
2015-03-29 03:30:00+02:00 1
dtype: int64
- xs(self, key, axis=0, level=None, drop_level: 'bool_t' = True)
- Return cross-section from the Series/DataFrame.
This method takes a `key` argument to select data at a particular
level of a MultiIndex.
Parameters
----------
key : label or tuple of label
Label contained in the index, or partially in a MultiIndex.
axis : {0 or 'index', 1 or 'columns'}, default 0
Axis to retrieve cross-section on.
level : object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate
which levels are used. Levels can be referred by label or position.
drop_level : bool, default True
If False, returns object with same levels as self.
Returns
-------
Series or DataFrame
Cross-section from the original Series or DataFrame
corresponding to the selected index levels.
See Also
--------
DataFrame.loc : Access a group of rows and columns
by label(s) or a boolean array.
DataFrame.iloc : Purely integer-location based indexing
for selection by position.
Notes
-----
`xs` can not be used to set values.
MultiIndex Slicers is a generic way to get/set values on
any level or levels.
It is a superset of `xs` functionality, see
:ref:`MultiIndex Slicers <advanced.mi_slicers>`.
Examples
--------
>>> d = {'num_legs': [4, 4, 2, 2],
... 'num_wings': [0, 0, 2, 2],
... 'class': ['mammal', 'mammal', 'mammal', 'bird'],
... 'animal': ['cat', 'dog', 'bat', 'penguin'],
... 'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = pd.DataFrame(data=d)
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df
num_legs num_wings
class animal locomotion
mammal cat walks 4 0
dog walks 4 0
bat flies 2 2
bird penguin walks 2 2
Get values at specified index
>>> df.xs('mammal')
num_legs num_wings
animal locomotion
cat walks 4 0
dog walks 4 0
bat flies 2 2
Get values at several indexes
>>> df.xs(('mammal', 'dog'))
num_legs num_wings
locomotion
walks 4 0
Get values at specified index and level
>>> df.xs('cat', level=1)
num_legs num_wings
class locomotion
mammal walks 4 0
Get values at several indexes and levels
>>> df.xs(('bird', 'walks'),
... level=[0, 'locomotion'])
num_legs num_wings
animal
penguin 2 2
Get values at specified column and axis
>>> df.xs('num_wings', axis=1)
class animal locomotion
mammal cat walks 0
dog walks 0
bat flies 2
bird penguin walks 2
Name: num_wings, dtype: int64
Readonly properties inherited from pandas.core.generic.NDFrame:
- flags
- Get the properties associated with this pandas object.
The available flags are
* :attr:`Flags.allows_duplicate_labels`
See Also
--------
Flags : Flags that apply to pandas objects.
DataFrame.attrs : Global metadata applying to this dataset.
Notes
-----
"Flags" differ from "metadata". Flags reflect properties of the
pandas object (the Series or DataFrame). Metadata refer to properties
of the dataset, and should be stored in :attr:`DataFrame.attrs`.
Examples
--------
>>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>
Flags can be get or set using ``.``
>>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False
Or by slicing with a key
>>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True
Data descriptors inherited from pandas.core.generic.NDFrame:
- attrs
- Dictionary of global attributes of this dataset.
.. warning::
attrs is experimental and may change without warning.
See Also
--------
DataFrame.flags : Global flags applying to this object.
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
Readonly properties inherited from pandas.core.indexing.IndexingMixin:
- at
- Access a single value for a row/column label pair.
Similar to ``loc``, in that both provide label-based lookups. Use
``at`` if you only need to get or set a single value in a DataFrame
or Series.
Raises
------
KeyError
If 'label' does not exist in DataFrame.
See Also
--------
DataFrame.iat : Access a single value for a row/column pair by integer
position.
DataFrame.loc : Access a group of rows and columns by label(s).
Series.at : Access a single value using a label.
Examples
--------
>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df
A B C
4 0 2 3
5 0 4 1
6 10 20 30
Get value at specified row/column pair
>>> df.at[4, 'B']
2
Set value at specified row/column pair
>>> df.at[4, 'B'] = 10
>>> df.at[4, 'B']
10
Get value within a Series
>>> df.loc[5].at['B']
4
- iat
- Access a single value for a row/column pair by integer position.
Similar to ``iloc``, in that both provide integer-based lookups. Use
``iat`` if you only need to get or set a single value in a DataFrame
or Series.
Raises
------
IndexError
When integer position is out of bounds.
See Also
--------
DataFrame.at : Access a single value for a row/column label pair.
DataFrame.loc : Access a group of rows and columns by label(s).
DataFrame.iloc : Access a group of rows and columns by integer position(s).
Examples
--------
>>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... columns=['A', 'B', 'C'])
>>> df
A B C
0 0 2 3
1 0 4 1
2 10 20 30
Get value at specified row/column pair
>>> df.iat[1, 2]
1
Set value at specified row/column pair
>>> df.iat[1, 2] = 10
>>> df.iat[1, 2]
10
Get value within a series
>>> df.loc[0].iat[1]
2
- iloc
- Purely integer-location based indexing for selection by position.
``.iloc[]`` is primarily integer position based (from ``0`` to
``length-1`` of the axis), but may also be used with a boolean
array.
Allowed inputs are:
- An integer, e.g. ``5``.
- A list or array of integers, e.g. ``[4, 3, 0]``.
- A slice object with ints, e.g. ``1:7``.
- A boolean array.
- A ``callable`` function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above).
This is useful in method chains, when you don't have a reference to the
calling object, but would like to base your selection on some value.
``.iloc`` will raise ``IndexError`` if a requested indexer is
out-of-bounds, except *slice* indexers which allow out-of-bounds
indexing (this conforms with python/numpy *slice* semantics).
See more at :ref:`Selection by Position <indexing.integer>`.
See Also
--------
DataFrame.iat : Fast integer location scalar accessor.
DataFrame.loc : Purely label-location based indexer for selection by label.
Series.iloc : Purely integer-location based indexing for
selection by position.
Examples
--------
>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
... {'a': 100, 'b': 200, 'c': 300, 'd': 400},
... {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
**Indexing just the rows**
With a scalar integer.
>>> type(df.iloc[0])
<class 'pandas.core.series.Series'>
>>> df.iloc[0]
a 1
b 2
c 3
d 4
Name: 0, dtype: int64
With a list of integers.
>>> df.iloc[[0]]
a b c d
0 1 2 3 4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
>>> df.iloc[[0, 1]]
a b c d
0 1 2 3 4
1 100 200 300 400
With a `slice` object.
>>> df.iloc[:3]
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
With a boolean mask the same length as the index.
>>> df.iloc[[True, False, True]]
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
With a callable, useful in method chains. The `x` passed
to the ``lambda`` is the DataFrame being sliced. This selects
the rows whose index label even.
>>> df.iloc[lambda x: x.index % 2 == 0]
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
**Indexing both axes**
You can mix the indexer types for the index and columns. Use ``:`` to
select the entire axis.
With scalar integers.
>>> df.iloc[0, 1]
2
With lists of integers.
>>> df.iloc[[0, 2], [1, 3]]
b d
0 2 4
2 2000 4000
With `slice` objects.
>>> df.iloc[1:3, 0:3]
a b c
1 100 200 300
2 1000 2000 3000
With a boolean array whose length matches the columns.
>>> df.iloc[:, [True, False, True, False]]
a c
0 1 3
1 100 300
2 1000 3000
With a callable function that expects the Series or DataFrame.
>>> df.iloc[:, lambda df: [0, 2]]
a c
0 1 3
1 100 300
2 1000 3000
- loc
- Access a group of rows and columns by label(s) or a boolean array.
``.loc[]`` is primarily label based, but may also be used with a
boolean array.
Allowed inputs are:
- A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
interpreted as a *label* of the index, and **never** as an
integer position along the index).
- A list or array of labels, e.g. ``['a', 'b', 'c']``.
- A slice object with labels, e.g. ``'a':'f'``.
.. warning:: Note that contrary to usual python slices, **both** the
start and the stop are included
- A boolean array of the same length as the axis being sliced,
e.g. ``[True, False, True]``.
- An alignable boolean Series. The index of the key will be aligned before
masking.
- An alignable Index. The Index of the returned selection will be the input.
- A ``callable`` function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above)
See more at :ref:`Selection by Label <indexing.label>`.
Raises
------
KeyError
If any items are not found.
IndexingError
If an indexed key is passed and its index is unalignable to the frame index.
See Also
--------
DataFrame.at : Access a single value for a row/column label pair.
DataFrame.iloc : Access group of rows and columns by integer position(s).
DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the
Series/DataFrame.
Series.loc : Access group of values using labels.
Examples
--------
**Getting values**
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
Single label. Note this returns the row as a Series.
>>> df.loc['viper']
max_speed 4
shield 5
Name: viper, dtype: int64
List of labels. Note using ``[[]]`` returns a DataFrame.
>>> df.loc[['viper', 'sidewinder']]
max_speed shield
viper 4 5
sidewinder 7 8
Single label for row and column
>>> df.loc['cobra', 'shield']
2
Slice with labels for row and single label for column. As mentioned
above, note that both the start and stop of the slice are included.
>>> df.loc['cobra':'viper', 'max_speed']
cobra 1
viper 4
Name: max_speed, dtype: int64
Boolean list with the same length as the row axis
>>> df.loc[[False, False, True]]
max_speed shield
sidewinder 7 8
Alignable boolean Series:
>>> df.loc[pd.Series([False, True, False],
... index=['viper', 'sidewinder', 'cobra'])]
max_speed shield
sidewinder 7 8
Index (same behavior as ``df.reindex``)
>>> df.loc[pd.Index(["cobra", "viper"], name="foo")]
max_speed shield
foo
cobra 1 2
viper 4 5
Conditional that returns a boolean Series
>>> df.loc[df['shield'] > 6]
max_speed shield
sidewinder 7 8
Conditional that returns a boolean Series with column labels specified
>>> df.loc[df['shield'] > 6, ['max_speed']]
max_speed
sidewinder 7
Callable that returns a boolean Series
>>> df.loc[lambda df: df['shield'] == 8]
max_speed shield
sidewinder 7 8
**Setting values**
Set value for all items matching the list of labels
>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
max_speed shield
cobra 1 2
viper 4 50
sidewinder 7 50
Set value for an entire row
>>> df.loc['cobra'] = 10
>>> df
max_speed shield
cobra 10 10
viper 4 50
sidewinder 7 50
Set value for an entire column
>>> df.loc[:, 'max_speed'] = 30
>>> df
max_speed shield
cobra 30 10
viper 30 50
sidewinder 30 50
Set value for rows matching callable condition
>>> df.loc[df['shield'] > 35] = 0
>>> df
max_speed shield
cobra 30 10
viper 0 0
sidewinder 0 0
**Getting values on a DataFrame with an index that has integer labels**
Another example using integers for the index
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
max_speed shield
7 1 2
8 4 5
9 7 8
Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.
>>> df.loc[7:9]
max_speed shield
7 1 2
8 4 5
9 7 8
**Getting values with a MultiIndex**
A number of examples using a DataFrame with a MultiIndex
>>> tuples = [
... ('cobra', 'mark i'), ('cobra', 'mark ii'),
... ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
... ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
... [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
Single label. Note this returns a DataFrame with a single index.
>>> df.loc['cobra']
max_speed shield
mark i 12 2
mark ii 0 4
Single index tuple. Note this returns a Series.
>>> df.loc[('cobra', 'mark ii')]
max_speed 0
shield 4
Name: (cobra, mark ii), dtype: int64
Single label for row and column. Similar to passing in a tuple, this
returns a Series.
>>> df.loc['cobra', 'mark i']
max_speed 12
shield 2
Name: (cobra, mark i), dtype: int64
Single tuple. Note using ``[[]]`` returns a DataFrame.
>>> df.loc[[('cobra', 'mark ii')]]
max_speed shield
cobra mark ii 0 4
Single tuple for the index with a single label for the column
>>> df.loc[('cobra', 'mark i'), 'shield']
2
Slice from index tuple to single label
>>> df.loc[('cobra', 'mark i'):'viper']
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
Slice from index tuple to index tuple
>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
|
class SparseDtype(pandas.core.dtypes.base.ExtensionDtype) |
| |
SparseDtype(dtype: 'Dtype' = <class 'numpy.float64'>, fill_value: 'Any' = None)
Dtype for data stored in :class:`SparseArray`.
This dtype implements the pandas ExtensionDtype interface.
Parameters
----------
dtype : str, ExtensionDtype, numpy.dtype, type, default numpy.float64
The dtype of the underlying array storing the non-fill value values.
fill_value : scalar, optional
The scalar value not stored in the SparseArray. By default, this
depends on `dtype`.
=========== ==========
dtype na_value
=========== ==========
float ``np.nan``
int ``0``
bool ``False``
datetime64 ``pd.NaT``
timedelta64 ``pd.NaT``
=========== ==========
The default value may be overridden by specifying a `fill_value`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- SparseDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Methods defined here:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self)
- Return hash(self).
- __init__(self, dtype: 'Dtype' = <class 'numpy.float64'>, fill_value: 'Any' = None)
- Initialize self. See help(type(self)) for accurate signature.
- __repr__(self) -> 'str'
- Return repr(self).
- update_dtype(self, dtype) -> 'SparseDtype'
- Convert the SparseDtype to a new dtype.
This takes care of converting the ``fill_value``.
Parameters
----------
dtype : Union[str, numpy.dtype, SparseDtype]
The new dtype to use.
* For a SparseDtype, it is simply returned
* For a NumPy dtype (or str), the current fill value
is converted to the new dtype, and a SparseDtype
with `dtype` and the new fill value is returned.
Returns
-------
SparseDtype
A new SparseDtype with the correct `dtype` and fill value
for that `dtype`.
Raises
------
ValueError
When the current fill value cannot be converted to the
new `dtype` (e.g. trying to convert ``np.nan`` to an
integer dtype).
Examples
--------
>>> SparseDtype(int, 0).update_dtype(float)
Sparse[float64, 0.0]
>>> SparseDtype(int, 1).update_dtype(SparseDtype(float, np.nan))
Sparse[float64, nan]
Class methods defined here:
- construct_array_type() -> 'type_t[SparseArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
- construct_from_string(string: 'str') -> 'SparseDtype' from builtins.type
- Construct a SparseDtype from a string form.
Parameters
----------
string : str
Can take the following forms.
string dtype
================ ============================
'int' SparseDtype[np.int64, 0]
'Sparse' SparseDtype[np.float64, nan]
'Sparse[int]' SparseDtype[np.int64, 0]
'Sparse[int, 0]' SparseDtype[np.int64, 0]
================ ============================
It is not possible to specify non-default fill values
with a string. An argument like ``'Sparse[int, 1]'``
will raise a ``TypeError`` because the default fill value
for integers is 0.
Returns
-------
SparseDtype
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties defined here:
- fill_value
- The fill value of the array.
Converting the SparseArray to a dense ndarray will fill the
array with this value.
.. warning::
It's possible to end up with a SparseArray that has ``fill_value``
values in ``sp_values``. This can occur, for example, when setting
``SparseArray.fill_value`` directly.
- kind
- The sparse kind. Either 'integer', or 'block'.
- name
- A string identifying the data type.
Will be used for display in, e.g. ``Series.dtype``
- subtype
- type
- The scalar type for the array, e.g. ``int``
It's expected ``ExtensionArray[item]`` returns an instance
of ``ExtensionDtype.type`` for scalar ``item``, assuming
that value is valid (not NA). NA values do not need to be
instances of `type`.
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- na_value
- Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the
user-facing "boxed" version of the NA value, not the physical NA value
for storage. e.g. for JSONArray, this is an empty dictionary.
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.dtypes.base.ExtensionDtype:
- __annotations__ = {'_metadata': 'tuple[str, ...]'}
|
class StringDtype(pandas.core.dtypes.base.ExtensionDtype) |
| |
StringDtype(storage=None)
Extension dtype for string data.
.. versionadded:: 1.0.0
.. warning::
StringDtype is considered experimental. The implementation and
parts of the API may change without warning.
In particular, StringDtype.na_value may change to no longer be
``numpy.nan``.
Parameters
----------
storage : {"python", "pyarrow"}, optional
If not given, the value of ``pd.options.mode.string_storage``.
Attributes
----------
None
Methods
-------
None
Examples
--------
>>> pd.StringDtype()
string[python]
>>> pd.StringDtype(storage="pyarrow")
string[pyarrow] |
| |
- Method resolution order:
- StringDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Methods defined here:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseStringArray'
- Construct StringArray from pyarrow Array/ChunkedArray.
- __hash__(self) -> 'int'
- Return hash(self).
- __init__(self, storage=None)
- Initialize self. See help(type(self)) for accurate signature.
- __repr__(self)
- Return repr(self).
- __str__(self)
- Return str(self).
- construct_array_type(self) -> 'type_t[BaseStringArray]'
- Return the array type associated with this dtype.
Returns
-------
type
Class methods defined here:
- construct_from_string(string) from builtins.type
- Construct a StringDtype from a string.
Parameters
----------
string : str
The type of the name. The storage type will be taking from `string`.
Valid options and their storage types are
========================== ==============================================
string result storage
========================== ==============================================
``'string'`` pd.options.mode.string_storage, default python
``'string[python]'`` python
``'string[pyarrow]'`` pyarrow
========================== ==============================================
Returns
-------
StringDtype
Raise
-----
TypeError
If the string is not a valid option.
Readonly properties defined here:
- type
- The scalar type for the array, e.g. ``int``
It's expected ``ExtensionArray[item]`` returns an instance
of ``ExtensionDtype.type`` for scalar ``item``, assuming
that value is valid (not NA). NA values do not need to be
instances of `type`.
Data and other attributes defined here:
- na_value = <NA>
- name = 'string'
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- kind
- A character code (one of 'biufcmMOSUV'), default 'O'
This should match the NumPy dtype used when the array is
converted to an ndarray, which is probably 'O' for object if
the extension type cannot be represented as a built-in NumPy
type.
See Also
--------
numpy.dtype.kind
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.dtypes.base.ExtensionDtype:
- __annotations__ = {'_metadata': 'tuple[str, ...]'}
|
class Timedelta(_Timedelta) |
| |
Timedelta(value=<object object at 0x000002042F571B50>, unit=None, **kwargs)
Represents a duration, the difference between two dates or times.
Timedelta is the pandas equivalent of python's ``datetime.timedelta``
and is interchangeable with it in most cases.
Parameters
----------
value : Timedelta, timedelta, np.timedelta64, str, or int
unit : str, default 'ns'
Denote the unit of the input, if input is an integer.
Possible values:
* 'W', 'D', 'T', 'S', 'L', 'U', or 'N'
* 'days' or 'day'
* 'hours', 'hour', 'hr', or 'h'
* 'minutes', 'minute', 'min', or 'm'
* 'seconds', 'second', or 'sec'
* 'milliseconds', 'millisecond', 'millis', or 'milli'
* 'microseconds', 'microsecond', 'micros', or 'micro'
* 'nanoseconds', 'nanosecond', 'nanos', 'nano', or 'ns'.
**kwargs
Available kwargs: {days, seconds, microseconds,
milliseconds, minutes, hours, weeks}.
Values for construction in compat with datetime.timedelta.
Numpy ints and floats will be coerced to python ints and floats.
Notes
-----
The constructor may take in either both values of value and unit or
kwargs as above. Either one of them must be used during initialization
The ``.value`` attribute is always in ns.
If the precision is higher than nanoseconds, the precision of the duration is
truncated to nanoseconds.
Examples
--------
Here we initialize Timedelta object with both value and unit
>>> td = pd.Timedelta(1, "d")
>>> td
Timedelta('1 days 00:00:00')
Here we initialize the Timedelta object with kwargs
>>> td2 = pd.Timedelta(days=1)
>>> td2
Timedelta('1 days 00:00:00')
We see that either way we get the same result |
| |
- Method resolution order:
- Timedelta
- _Timedelta
- datetime.timedelta
- builtins.object
Methods defined here:
- __abs__(self)
- __add__(self, other)
- __divmod__(self, other)
- __floordiv__(self, other)
- __mod__(self, other)
- __mul__(self, other)
- __neg__(self)
- __new__(cls, value=<object object at 0x000002042F571B50>, unit=None, **kwargs)
- __pos__(self)
- __radd__(self, other)
- __rdivmod__(self, other)
- __reduce__(self)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__ = __mul__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __setstate__(self, state)
- __sub__(self, other)
- __truediv__(self, other)
- ceil(self, freq)
- Return a new Timedelta ceiled to this resolution.
Parameters
----------
freq : str
Frequency string indicating the ceiling resolution.
- floor(self, freq)
- Return a new Timedelta floored to this resolution.
Parameters
----------
freq : str
Frequency string indicating the flooring resolution.
- round(self, freq)
- Round the Timedelta to the specified resolution.
Parameters
----------
freq : str
Frequency string indicating the rounding resolution.
Returns
-------
a new Timedelta rounded to the given resolution of `freq`
Raises
------
ValueError if the freq cannot be converted
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes defined here:
- max = Timedelta('106751 days 23:47:16.854775807')
- min = Timedelta('-106752 days +00:12:43.145224193')
- resolution = Timedelta('0 days 00:00:00.000000001')
Methods inherited from _Timedelta:
- __bool__(self, /)
- self != 0
- __eq__(self, value, /)
- Return self==value.
- __ge__(self, value, /)
- Return self>=value.
- __gt__(self, value, /)
- Return self>value.
- __hash__(self, /)
- Return hash(self).
- __le__(self, value, /)
- Return self<=value.
- __lt__(self, value, /)
- Return self<value.
- __ne__(self, value, /)
- Return self!=value.
- __reduce_cython__(...)
- __repr__(self, /)
- Return repr(self).
- __setstate_cython__(...)
- __str__(self, /)
- Return str(self).
- isoformat(...)
- Format Timedelta as ISO 8601 Duration like
``P[n]Y[n]M[n]DT[n]H[n]M[n]S``, where the ``[n]`` s are replaced by the
values. See https://en.wikipedia.org/wiki/ISO_8601#Durations.
Returns
-------
str
See Also
--------
Timestamp.isoformat : Function is used to convert the given
Timestamp object into the ISO format.
Notes
-----
The longest component is days, whose value may be larger than
365.
Every component is always included, even if its value is 0.
Pandas uses nanosecond precision, so up to 9 decimal places may
be included in the seconds component.
Trailing 0's are removed from the seconds component after the decimal.
We do not 0 pad components, so it's `...T5H...`, not `...T05H...`
Examples
--------
>>> td = pd.Timedelta(days=6, minutes=50, seconds=3,
... milliseconds=10, microseconds=10, nanoseconds=12)
>>> td.isoformat()
'P6DT0H50M3.010010012S'
>>> pd.Timedelta(hours=1, seconds=10).isoformat()
'P0DT1H0M10S'
>>> pd.Timedelta(days=500.5).isoformat()
'P500DT12H0M0S'
- to_numpy(...)
- Convert the Timedelta to a NumPy timedelta64.
.. versionadded:: 0.25.0
This is an alias method for `Timedelta.to_timedelta64()`. The dtype and
copy parameters are available here only for compatibility. Their values
will not affect the return value.
Returns
-------
numpy.timedelta64
See Also
--------
Series.to_numpy : Similar method for Series.
- to_pytimedelta(...)
- Convert a pandas Timedelta object into a python ``datetime.timedelta`` object.
Timedelta objects are internally saved as numpy datetime64[ns] dtype.
Use to_pytimedelta() to convert to object dtype.
Returns
-------
datetime.timedelta or numpy.array of datetime.timedelta
See Also
--------
to_timedelta : Convert argument to Timedelta type.
Notes
-----
Any nanosecond resolution will be lost.
- to_timedelta64(...)
- Return a numpy.timedelta64 object with 'ns' precision.
- view(...)
- Array view compatibility.
Data descriptors inherited from _Timedelta:
- asm8
- Return a numpy timedelta64 array scalar view.
Provides access to the array scalar view (i.e. a combination of the
value and the units) associated with the numpy.timedelta64().view(),
including a 64-bit integer representation of the timedelta in
nanoseconds (Python int compatible).
Returns
-------
numpy timedelta64 array scalar view
Array scalar view of the timedelta in nanoseconds.
Examples
--------
>>> td = pd.Timedelta('1 days 2 min 3 us 42 ns')
>>> td.asm8
numpy.timedelta64(86520000003042,'ns')
>>> td = pd.Timedelta('2 min 3 s')
>>> td.asm8
numpy.timedelta64(123000000000,'ns')
>>> td = pd.Timedelta('3 ms 5 us')
>>> td.asm8
numpy.timedelta64(3005000,'ns')
>>> td = pd.Timedelta(42, unit='ns')
>>> td.asm8
numpy.timedelta64(42,'ns')
- components
- Return a components namedtuple-like.
- delta
- Return the timedelta in nanoseconds (ns), for internal compatibility.
Returns
-------
int
Timedelta in nanoseconds.
Examples
--------
>>> td = pd.Timedelta('1 days 42 ns')
>>> td.delta
86400000000042
>>> td = pd.Timedelta('3 s')
>>> td.delta
3000000000
>>> td = pd.Timedelta('3 ms 5 us')
>>> td.delta
3005000
>>> td = pd.Timedelta(42, unit='ns')
>>> td.delta
42
- freq
- is_populated
- nanoseconds
- Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
Returns
-------
int
Number of nanoseconds.
See Also
--------
Timedelta.components : Return all attributes with assigned values
(i.e. days, hours, minutes, seconds, milliseconds, microseconds,
nanoseconds).
Examples
--------
**Using string input**
>>> td = pd.Timedelta('1 days 2 min 3 us 42 ns')
>>> td.nanoseconds
42
**Using integer input**
>>> td = pd.Timedelta(42, unit='ns')
>>> td.nanoseconds
42
- resolution_string
- Return a string representing the lowest timedelta resolution.
Each timedelta has a defined resolution that represents the lowest OR
most granular level of precision. Each level of resolution is
represented by a short string as defined below:
Resolution: Return value
* Days: 'D'
* Hours: 'H'
* Minutes: 'T'
* Seconds: 'S'
* Milliseconds: 'L'
* Microseconds: 'U'
* Nanoseconds: 'N'
Returns
-------
str
Timedelta resolution.
Examples
--------
>>> td = pd.Timedelta('1 days 2 min 3 us 42 ns')
>>> td.resolution_string
'N'
>>> td = pd.Timedelta('1 days 2 min 3 us')
>>> td.resolution_string
'U'
>>> td = pd.Timedelta('2 min 3 s')
>>> td.resolution_string
'S'
>>> td = pd.Timedelta(36, unit='us')
>>> td.resolution_string
'U'
- value
Data and other attributes inherited from _Timedelta:
- __array_priority__ = 100
- __pyx_vtable__ = <capsule object NULL>
Methods inherited from datetime.timedelta:
- __getattribute__(self, name, /)
- Return getattr(self, name).
- total_seconds(...)
- Total seconds in the duration.
Data descriptors inherited from datetime.timedelta:
- days
- Number of days.
- microseconds
- Number of microseconds (>= 0 and less than 1 second).
- seconds
- Number of seconds (>= 0 and less than 1 day).
|
class TimedeltaIndex(pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin) |
| |
TimedeltaIndex(data=None, unit=None, freq=<no_default>, closed=None, dtype=dtype('<m8[ns]'), copy=False, name=None)
Immutable ndarray of timedelta64 data, represented internally as int64, and
which can be boxed to timedelta objects.
Parameters
----------
data : array-like (1-dimensional), optional
Optional timedelta-like data to construct index with.
unit : unit of the arg (D,h,m,s,ms,us,ns) denote the unit, optional
Which is an integer/float number.
freq : str or pandas offset object, optional
One of pandas date offset strings or corresponding objects. The string
'infer' can be passed in order to set the frequency of the index as the
inferred frequency upon creation.
copy : bool
Make a copy of input ndarray.
name : object
Name to be stored in the index.
Attributes
----------
days
seconds
microseconds
nanoseconds
components
inferred_freq
Methods
-------
to_pytimedelta
to_series
round
floor
ceil
to_frame
mean
See Also
--------
Index : The base pandas Index type.
Timedelta : Represents a duration between two dates or times.
DatetimeIndex : Index of datetime64 data.
PeriodIndex : Index of Period data.
timedelta_range : Create a fixed-frequency TimedeltaIndex.
Notes
-----
To learn more about the frequency strings, please see `this link
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`__. |
| |
- Method resolution order:
- TimedeltaIndex
- pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin
- pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin
- pandas.core.indexes.extension.NDArrayBackedExtensionIndex
- pandas.core.indexes.extension.ExtensionIndex
- pandas.core.indexes.base.Index
- pandas.core.base.IndexOpsMixin
- pandas.core.arraylike.OpsMixin
- pandas.core.base.PandasObject
- pandas.core.accessor.DirNamesMixin
- builtins.object
Methods defined here:
- __abs__(self, *args, **kwargs)
- # error: Incompatible redefinition (redefinition with type "Callable[[Any,
# VarArg(Any), KwArg(Any)], Any]", original type "property")
- __neg__(self, *args, **kwargs)
- # error: Incompatible redefinition (redefinition with type "Callable[[Any,
# VarArg(Any), KwArg(Any)], Any]", original type "property")
- __pos__(self, *args, **kwargs)
- # error: Incompatible redefinition (redefinition with type "Callable[[Any,
# VarArg(Any), KwArg(Any)], Any]", original type "property")
- ceil(self, *args, **kwargs)
- Perform ceil operation on the data to the specified `freq`.
Parameters
----------
freq : str or Offset
The frequency level to ceil the index to. Must be a fixed
frequency like 'S' (second) not 'ME' (month end). See
:ref:`frequency aliases <timeseries.offset_aliases>` for
a list of possible `freq` values.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
Only relevant for DatetimeIndex:
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
DatetimeIndex, TimedeltaIndex, or Series
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
Raises
------
ValueError if the `freq` cannot be converted.
Notes
-----
If the timestamps have a timezone, ceiling will take place relative to the
local ("wall") time and re-localized to the same timezone. When ceiling
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
**DatetimeIndex**
>>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='T')
>>> rng.ceil('H')
DatetimeIndex(['2018-01-01 12:00:00', '2018-01-01 12:00:00',
'2018-01-01 13:00:00'],
dtype='datetime64[ns]', freq=None)
**Series**
>>> pd.Series(rng).dt.ceil("H")
0 2018-01-01 12:00:00
1 2018-01-01 12:00:00
2 2018-01-01 13:00:00
dtype: datetime64[ns]
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> rng_tz = pd.DatetimeIndex(["2021-10-31 01:30:00"], tz="Europe/Amsterdam")
>>> rng_tz.ceil("H", ambiguous=False)
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
>>> rng_tz.ceil("H", ambiguous=True)
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
- floor(self, *args, **kwargs)
- Perform floor operation on the data to the specified `freq`.
Parameters
----------
freq : str or Offset
The frequency level to floor the index to. Must be a fixed
frequency like 'S' (second) not 'ME' (month end). See
:ref:`frequency aliases <timeseries.offset_aliases>` for
a list of possible `freq` values.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
Only relevant for DatetimeIndex:
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
DatetimeIndex, TimedeltaIndex, or Series
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
Raises
------
ValueError if the `freq` cannot be converted.
Notes
-----
If the timestamps have a timezone, flooring will take place relative to the
local ("wall") time and re-localized to the same timezone. When flooring
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
**DatetimeIndex**
>>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='T')
>>> rng.floor('H')
DatetimeIndex(['2018-01-01 11:00:00', '2018-01-01 12:00:00',
'2018-01-01 12:00:00'],
dtype='datetime64[ns]', freq=None)
**Series**
>>> pd.Series(rng).dt.floor("H")
0 2018-01-01 11:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
dtype: datetime64[ns]
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> rng_tz = pd.DatetimeIndex(["2021-10-31 03:30:00"], tz="Europe/Amsterdam")
>>> rng_tz.floor("2H", ambiguous=False)
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
>>> rng_tz.floor("2H", ambiguous=True)
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
- get_loc(self, key, method=None, tolerance=None)
- Get integer location for requested label
Returns
-------
loc : int, slice, or ndarray[int]
- median(self, *args, **kwargs)
- # error: Incompatible redefinition (redefinition with type "Callable[[Any,
# VarArg(Any), KwArg(Any)], Any]", original type "property")
- round(self, *args, **kwargs)
- Perform round operation on the data to the specified `freq`.
Parameters
----------
freq : str or Offset
The frequency level to round the index to. Must be a fixed
frequency like 'S' (second) not 'ME' (month end). See
:ref:`frequency aliases <timeseries.offset_aliases>` for
a list of possible `freq` values.
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
Only relevant for DatetimeIndex:
- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- 'NaT' will return NaT where there are ambiguous times
- 'raise' will raise an AmbiguousTimeError if there are ambiguous
times.
nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- 'shift_forward' will shift the nonexistent time forward to the
closest existing time
- 'shift_backward' will shift the nonexistent time backward to the
closest existing time
- 'NaT' will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
DatetimeIndex, TimedeltaIndex, or Series
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
Raises
------
ValueError if the `freq` cannot be converted.
Notes
-----
If the timestamps have a timezone, rounding will take place relative to the
local ("wall") time and re-localized to the same timezone. When rounding
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
**DatetimeIndex**
>>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='T')
>>> rng.round('H')
DatetimeIndex(['2018-01-01 12:00:00', '2018-01-01 12:00:00',
'2018-01-01 12:00:00'],
dtype='datetime64[ns]', freq=None)
**Series**
>>> pd.Series(rng).dt.round("H")
0 2018-01-01 12:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
dtype: datetime64[ns]
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> rng_tz = pd.DatetimeIndex(["2021-10-31 03:30:00"], tz="Europe/Amsterdam")
>>> rng_tz.floor("2H", ambiguous=False)
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
>>> rng_tz.floor("2H", ambiguous=True)
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
- std(self, *args, **kwargs)
- # error: Incompatible redefinition (redefinition with type "Callable[[Any,
# VarArg(Any), KwArg(Any)], Any]", original type "property")
- sum(self, *args, **kwargs)
- # error: Incompatible redefinition (redefinition with type "Callable[[Any,
# VarArg(Any), KwArg(Any)], Any]", original type "property")
- to_pytimedelta(self, *args, **kwargs)
- Return Timedelta Array/Index as object ndarray of datetime.timedelta
objects.
Returns
-------
timedeltas : ndarray[object]
- total_seconds(self, *args, **kwargs)
- Return total duration of each element expressed in seconds.
This method is available directly on TimedeltaArray, TimedeltaIndex
and on Series containing timedelta values under the ``.dt`` namespace.
Returns
-------
seconds : [ndarray, Float64Index, Series]
When the calling object is a TimedeltaArray, the return type
is ndarray. When the calling object is a TimedeltaIndex,
the return type is a Float64Index. When the calling object
is a Series, the return type is Series of type `float64` whose
index is the same as the original.
See Also
--------
datetime.timedelta.total_seconds : Standard library version
of this method.
TimedeltaIndex.components : Return a DataFrame with components of
each Timedelta.
Examples
--------
**Series**
>>> s = pd.Series(pd.to_timedelta(np.arange(5), unit='d'))
>>> s
0 0 days
1 1 days
2 2 days
3 3 days
4 4 days
dtype: timedelta64[ns]
>>> s.dt.total_seconds()
0 0.0
1 86400.0
2 172800.0
3 259200.0
4 345600.0
dtype: float64
**TimedeltaIndex**
>>> idx = pd.to_timedelta(np.arange(5), unit='d')
>>> idx
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'],
dtype='timedelta64[ns]', freq=None)
>>> idx.total_seconds()
Float64Index([0.0, 86400.0, 172800.0, 259200.00000000003, 345600.0],
dtype='float64')
Static methods defined here:
- __new__(cls, data=None, unit=None, freq=<no_default>, closed=None, dtype=dtype('<m8[ns]'), copy=False, name=None)
- Create and return a new object. See help(type) for accurate signature.
Readonly properties defined here:
- inferred_type
- Return a string of the type inferred from the values.
Data descriptors defined here:
- components
- Return a dataframe of the components (days, hours, minutes,
seconds, milliseconds, microseconds, nanoseconds) of the Timedeltas.
Returns
-------
DataFrame
- days
- Number of days for each element.
- microseconds
- Number of microseconds (>= 0 and less than 1 second) for each element.
- nanoseconds
- Number of nanoseconds (>= 0 and less than 1 microsecond) for each element.
- seconds
- Number of seconds (>= 0 and less than 1 day) for each element.
Data and other attributes defined here:
- __annotations__ = {'_data': 'TimedeltaArray'}
Methods inherited from pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin:
- delete(self, loc)
- Make new Index with passed location(-s) deleted.
Parameters
----------
loc : int or list of int
Location of item(-s) which will be deleted.
Use a list of locations to delete more than one value at the same time.
Returns
-------
Index
Will be same type as self, except for RangeIndex.
See Also
--------
numpy.delete : Delete any rows and column from NumPy array (ndarray).
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete(1)
Index(['a', 'c'], dtype='object')
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx.delete([0, 2])
Index(['b'], dtype='object')
- insert(self, loc: 'int', item)
- Make new Index inserting new item at location.
Follows Python numpy.insert semantics for negative values.
Parameters
----------
loc : int
item : object
Returns
-------
new_index : Index
- is_type_compatible(self, kind: 'str') -> 'bool'
- Whether the index type is compatible with the provided type.
- take(self, indices, axis=0, allow_fill=True, fill_value=None, **kwargs)
- Return a new Index of the values selected by the indices.
For internal compatibility with numpy arrays.
Parameters
----------
indices : array-like
Indices to be taken.
axis : int, optional
The axis over which to select values, always 0.
allow_fill : bool, default True
fill_value : scalar, default None
If allow_fill=True and fill_value is not None, indices specified by
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
Returns
-------
Index
An index formed of elements at the given indices. Will be the same
type as self, except for RangeIndex.
See Also
--------
numpy.ndarray.take: Return an array formed from the
elements of a at the given indices.
Readonly properties inherited from pandas.core.indexes.datetimelike.DatetimeTimedeltaMixin:
- values
- Return an array representing the data in the Index.
.. warning::
We recommend using :attr:`Index.array` or
:meth:`Index.to_numpy`, depending on whether you need
a reference to the underlying data or a NumPy array.
Returns
-------
array: numpy.ndarray or ExtensionArray
See Also
--------
Index.array : Reference to the underlying data.
Index.to_numpy : A NumPy array representing the underlying data.
Methods inherited from pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin:
- __contains__(self, key: 'Any') -> 'bool'
- Return a boolean indicating whether the provided key is in the index.
Parameters
----------
key : label
The key to check if it is present in the index.
Returns
-------
bool
Whether the key search is in the index.
Raises
------
TypeError
If the key is not hashable.
See Also
--------
Index.isin : Returns an ndarray of boolean dtype indicating whether the
list-like key is in the index.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> 2 in idx
True
>>> 6 in idx
False
- equals(self, other: 'Any') -> 'bool'
- Determines if two Index objects contain the same elements.
- format(self, name: 'bool' = False, formatter: 'Callable | None' = None, na_rep: 'str' = 'NaT', date_format: 'str | None' = None) -> 'list[str]'
- Render a string representation of the Index.
- mean(self, *args, **kwargs)
- Return the mean value of the Array.
.. versionadded:: 0.25.0
Parameters
----------
skipna : bool, default True
Whether to ignore any NaT elements.
axis : int, optional, default 0
Returns
-------
scalar
Timestamp or Timedelta.
See Also
--------
numpy.ndarray.mean : Returns the average of array elements along a given axis.
Series.mean : Return the mean value in a Series.
Notes
-----
mean is only defined for Datetime and Timedelta dtypes, not for Period.
- shift(self: '_T', periods: 'int' = 1, freq=None) -> '_T'
- Shift index by desired number of time frequency increments.
This method is for shifting the values of datetime-like indexes
by a specified time increment a given number of times.
Parameters
----------
periods : int, default 1
Number of periods (or increments) to shift by,
can be positive or negative.
freq : pandas.DateOffset, pandas.Timedelta or string, optional
Frequency increment to shift by.
If None, the index is shifted by its own `freq` attribute.
Offset aliases are valid strings, e.g., 'D', 'W', 'M' etc.
Returns
-------
pandas.DatetimeIndex
Shifted index.
See Also
--------
Index.shift : Shift values of Index.
PeriodIndex.shift : Shift values of PeriodIndex.
Data descriptors inherited from pandas.core.indexes.datetimelike.DatetimeIndexOpsMixin:
- asi8
- Integer representation of the values.
Returns
-------
ndarray
An ndarray with int64 dtype.
- freq
- Return the frequency object if it is set, otherwise None.
- freqstr
- Return the frequency object as a string if its set, otherwise None.
- hasnans
- return if I have any nans; enables various perf speedups
- inferred_freq
- Tries to return a string representing a frequency guess,
generated by infer_freq. Returns None if it can't autodetect the
frequency.
- resolution
- Returns day, hour, minute, second, millisecond or microsecond
Methods inherited from pandas.core.indexes.extension.ExtensionIndex:
- map(self, mapper, na_action=None)
- Map values using an input mapping or function.
Parameters
----------
mapper : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence.
Returns
-------
applied : Union[Index, MultiIndex], inferred
The output of the mapping function applied to the index.
If the function returns a tuple with more than one element
a MultiIndex will be returned.
Methods inherited from pandas.core.indexes.base.Index:
- __and__(self, other)
- __array__(self, dtype=None) -> 'np.ndarray'
- The array interface, return my values.
- __array_ufunc__(self, ufunc: 'np.ufunc', method: 'str_t', *inputs, **kwargs)
- __array_wrap__(self, result, context=None)
- Gets called after a ufunc and other functions e.g. np.split.
- __bool__ = __nonzero__(self)
- __copy__(self: '_IndexT', **kwargs) -> '_IndexT'
- __deepcopy__(self: '_IndexT', memo=None) -> '_IndexT'
- Parameters
----------
memo, default None
Standard signature. Unused
- __getitem__(self, key)
- Override numpy.ndarray's __getitem__ method to work as desired.
This function adds lists and Series as valid boolean indexers
(ndarrays only supports ndarray with dtype=bool).
If resulting ndim != 1, plain ndarray is returned instead of
corresponding `Index` subclass.
- __iadd__(self, other)
- __invert__(self)
- __len__(self) -> 'int'
- Return the length of the Index.
- __nonzero__(self)
- __or__(self, other)
- __reduce__(self)
- Helper for pickle.
- __repr__(self) -> 'str_t'
- Return a string representation for this object.
- __setitem__(self, key, value)
- __xor__(self, other)
- all(self, *args, **kwargs)
- Return whether all elements are Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
all : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.any : Return whether any element in an Index is True.
Series.any : Return whether any element in a Series is True.
Series.all : Return whether all elements in a Series are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
True, because nonzero integers are considered True.
>>> pd.Index([1, 2, 3]).all()
True
False, because ``0`` is considered False.
>>> pd.Index([0, 1, 2]).all()
False
- any(self, *args, **kwargs)
- Return whether any element is Truthy.
Parameters
----------
*args
Required for compatibility with numpy.
**kwargs
Required for compatibility with numpy.
Returns
-------
any : bool or array-like (if axis is specified)
A single element array-like may be converted to bool.
See Also
--------
Index.all : Return whether all elements are True.
Series.all : Return whether all elements are True.
Notes
-----
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
Examples
--------
>>> index = pd.Index([0, 1, 2])
>>> index.any()
True
>>> index = pd.Index([0, 0, 0])
>>> index.any()
False
- append(self, other: 'Index | Sequence[Index]') -> 'Index'
- Append a collection of Index options together.
Parameters
----------
other : Index or list/tuple of indices
Returns
-------
Index
- argmax(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the maximum value.
See Also
--------
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argmin(self, axis=None, skipna=True, *args, **kwargs)
- Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
int
Row position of the minimum value.
See Also
--------
Series.argmin : Return position of the minimum value.
Series.argmax : Return position of the maximum value.
numpy.ndarray.argmin : Equivalent method for numpy arrays.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.
Examples
--------
Consider dataset containing cereal calories
>>> s = pd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
- argsort(self, *args, **kwargs) -> 'npt.NDArray[np.intp]'
- Return the integer indices that would sort the index.
Parameters
----------
*args
Passed to `numpy.ndarray.argsort`.
**kwargs
Passed to `numpy.ndarray.argsort`.
Returns
-------
np.ndarray[np.intp]
Integer indices that would sort the index if used as
an indexer.
See Also
--------
numpy.argsort : Similar method for NumPy arrays.
Index.sort_values : Return sorted copy of Index.
Examples
--------
>>> idx = pd.Index(['b', 'a', 'd', 'c'])
>>> idx
Index(['b', 'a', 'd', 'c'], dtype='object')
>>> order = idx.argsort()
>>> order
array([1, 0, 3, 2])
>>> idx[order]
Index(['a', 'b', 'c', 'd'], dtype='object')
- asof(self, label)
- Return the label from the index, or, if not present, the previous one.
Assuming that the index is sorted, return the passed index label if it
is in the index, or return the previous index label if the passed one
is not in the index.
Parameters
----------
label : object
The label up to which the method returns the latest index label.
Returns
-------
object
The passed label if it is in the index. The previous label if the
passed label is not in the sorted index or `NaN` if there is no
such label.
See Also
--------
Series.asof : Return the latest value in a Series up to the
passed index.
merge_asof : Perform an asof merge (similar to left join but it
matches on nearest key rather than equal key).
Index.get_loc : An `asof` is a thin wrapper around `get_loc`
with method='pad'.
Examples
--------
`Index.asof` returns the latest index label up to the passed label.
>>> idx = pd.Index(['2013-12-31', '2014-01-02', '2014-01-03'])
>>> idx.asof('2014-01-01')
'2013-12-31'
If the label is in the index, the method returns the passed label.
>>> idx.asof('2014-01-02')
'2014-01-02'
If all of the labels in the index are later than the passed label,
NaN is returned.
>>> idx.asof('1999-01-02')
nan
If the index is not sorted, an error is raised.
>>> idx_not_sorted = pd.Index(['2013-12-31', '2015-01-02',
... '2014-01-03'])
>>> idx_not_sorted.asof('2013-12-31')
Traceback (most recent call last):
ValueError: index must be monotonic increasing or decreasing
- asof_locs(self, where: 'Index', mask: 'np.ndarray') -> 'npt.NDArray[np.intp]'
- Return the locations (indices) of labels in the index.
As in the `asof` function, if the label (a particular entry in
`where`) is not in the index, the latest index label up to the
passed label is chosen and its index returned.
If all of the labels in the index are later than a label in `where`,
-1 is returned.
`mask` is used to ignore NA values in the index during calculation.
Parameters
----------
where : Index
An Index consisting of an array of timestamps.
mask : np.ndarray[bool]
Array of booleans denoting where values in the original
data are not NA.
Returns
-------
np.ndarray[np.intp]
An array of locations (indices) of the labels from the Index
which correspond to the return values of the `asof` function
for every element in `where`.
- astype(self, dtype, copy: 'bool' = True)
- Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a TypeError exception is raised.
Parameters
----------
dtype : numpy dtype or pandas type
Note that any signed integer `dtype` is treated as ``'int64'``,
and any unsigned integer `dtype` is treated as ``'uint64'``,
regardless of the size.
copy : bool, default True
By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
Returns
-------
Index
Index with values cast to specified dtype.
- copy(self: '_IndexT', name: 'Hashable | None' = None, deep: 'bool' = False, dtype: 'Dtype | None' = None, names: 'Sequence[Hashable] | None' = None) -> '_IndexT'
- Make a copy of this object.
Name and dtype sets those attributes on the new object.
Parameters
----------
name : Label, optional
Set name for new object.
deep : bool, default False
dtype : numpy dtype or pandas type, optional
Set dtype for new object.
.. deprecated:: 1.2.0
use ``astype`` method instead.
names : list-like, optional
Kept for compatibility with MultiIndex. Should not be used.
.. deprecated:: 1.4.0
use ``name`` instead.
Returns
-------
Index
Index refer to new object which is a copy of this object.
Notes
-----
In most cases, there should be no functional difference from using
``deep``, but if ``deep`` is passed it will attempt to deepcopy.
- difference(self, other, sort=None)
- Return a new Index with elements of index not in `other`.
This is the set difference of two Index objects.
Parameters
----------
other : Index or array-like
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
difference : Index
Examples
--------
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')
- drop(self, labels, errors: 'str_t' = 'raise') -> 'Index'
- Make new Index with passed list of labels deleted.
Parameters
----------
labels : array-like or scalar
errors : {'ignore', 'raise'}, default 'raise'
If 'ignore', suppress error and existing labels are dropped.
Returns
-------
dropped : Index
Will be same type as self, except for RangeIndex.
Raises
------
KeyError
If not all of the labels are found in the selected axis
- drop_duplicates(self: '_IndexT', keep: 'str_t | bool' = 'first') -> '_IndexT'
- Return Index with duplicate values removed.
Parameters
----------
keep : {'first', 'last', ``False``}, default 'first'
- 'first' : Drop duplicates except for the first occurrence.
- 'last' : Drop duplicates except for the last occurrence.
- ``False`` : Drop all duplicates.
Returns
-------
deduplicated : Index
See Also
--------
Series.drop_duplicates : Equivalent method on Series.
DataFrame.drop_duplicates : Equivalent method on DataFrame.
Index.duplicated : Related method on Index, indicating duplicate
Index values.
Examples
--------
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The `keep` parameter controls which duplicate values are removed.
The value 'first' keeps the first occurrence for each
set of duplicated entries. The default value of keep is 'first'.
>>> idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value 'last' keeps the last occurrence for each set of duplicated
entries.
>>> idx.drop_duplicates(keep='last')
Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value ``False`` discards all sets of duplicated entries.
>>> idx.drop_duplicates(keep=False)
Index(['cow', 'beetle', 'hippo'], dtype='object')
- droplevel(self, level=0)
- Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex.
Parameters
----------
level : int, str, or list-like, default 0
If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
Returns
-------
Index or MultiIndex
Examples
--------
>>> mi = pd.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
>>> mi.droplevel()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
>>> mi.droplevel(2)
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel('z')
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
>>> mi.droplevel(['x', 'y'])
Int64Index([5, 6], dtype='int64', name='z')
- dropna(self: '_IndexT', how: 'str_t' = 'any') -> '_IndexT'
- Return Index without NA/NaN values.
Parameters
----------
how : {'any', 'all'}, default 'any'
If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
Returns
-------
Index
- duplicated(self, keep: "Literal[('first', 'last', False)]" = 'first') -> 'npt.NDArray[np.bool_]'
- Indicate duplicate index values.
Duplicated values are indicated as ``True`` values in the resulting
array. Either all duplicates, all except the first, or all except the
last occurrence of duplicates can be indicated.
Parameters
----------
keep : {'first', 'last', False}, default 'first'
The value or values in a set of duplicates to mark as missing.
- 'first' : Mark duplicates as ``True`` except for the first
occurrence.
- 'last' : Mark duplicates as ``True`` except for the last
occurrence.
- ``False`` : Mark all duplicates as ``True``.
Returns
-------
np.ndarray[bool]
See Also
--------
Series.duplicated : Equivalent method on pandas.Series.
DataFrame.duplicated : Equivalent method on pandas.DataFrame.
Index.drop_duplicates : Remove duplicate values from Index.
Examples
--------
By default, for each set of duplicated values, the first occurrence is
set to False and all others to True:
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama'])
>>> idx.duplicated()
array([False, False, True, False, True])
which is equivalent to
>>> idx.duplicated(keep='first')
array([False, False, True, False, True])
By using 'last', the last occurrence of each set of duplicated values
is set on False and all others on True:
>>> idx.duplicated(keep='last')
array([ True, False, True, False, False])
By setting keep on ``False``, all duplicates are True:
>>> idx.duplicated(keep=False)
array([ True, False, True, False, True])
- fillna(self, value=None, downcast=None)
- Fill NA/NaN values with the specified value.
Parameters
----------
value : scalar
Scalar value to use to fill holes (e.g. 0).
This value cannot be a list-likes.
downcast : dict, default is None
A dict of item->dtype of what to downcast if possible,
or the string 'infer' which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
Returns
-------
Index
See Also
--------
DataFrame.fillna : Fill NaN values of a DataFrame.
Series.fillna : Fill NaN Values of a Series.
- get_indexer(self, target, method: 'str_t | None' = None, limit: 'int | None' = None, tolerance=None) -> 'npt.NDArray[np.intp]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
Notes
-----
Returns -1 for unmatched values, for further explanation see the
example below.
Examples
--------
>>> index = pd.Index(['c', 'a', 'b'])
>>> index.get_indexer(['a', 'b', 'x'])
array([ 1, 2, -1])
Notice that the return value is an array of locations in ``index``
and ``x`` is marked by -1, as it is not in ``index``.
- get_indexer_for(self, target) -> 'npt.NDArray[np.intp]'
- Guaranteed return of an indexer even when non-unique.
This dispatches to get_indexer or get_indexer_non_unique
as appropriate.
Returns
-------
np.ndarray[np.intp]
List of indices.
Examples
--------
>>> idx = pd.Index([np.nan, 'var1', np.nan])
>>> idx.get_indexer_for([np.nan])
array([0, 2])
- get_indexer_non_unique(self, target) -> 'tuple[npt.NDArray[np.intp], npt.NDArray[np.intp]]'
- Compute indexer and mask for new index given the current index. The
indexer should be then used as an input to ndarray.take to align the
current data to the new index.
Parameters
----------
target : Index
Returns
-------
indexer : np.ndarray[np.intp]
Integers from 0 to n - 1 indicating that the index at these
positions matches the corresponding target values. Missing values
in the target are marked by -1.
missing : np.ndarray[np.intp]
An indexer into the target of the values not found.
These correspond to the -1 in the indexer array.
- get_level_values = _get_level_values(self, level) -> 'Index'
- get_slice_bound(self, label, side: "Literal[('left', 'right')]", kind=<no_default>) -> 'int'
- Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position
of given label.
Parameters
----------
label : object
side : {'left', 'right'}
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
int
Index of label.
- get_value(self, series: 'Series', key)
- Fast lookup of value from 1-dimensional ndarray.
Only use this if you know what you're doing.
Returns
-------
scalar or Series
- groupby(self, values) -> 'PrettyDict[Hashable, np.ndarray]'
- Group the index labels by a given array of values.
Parameters
----------
values : array
Values used to determine the groups.
Returns
-------
dict
{group name -> group labels}
- holds_integer(self) -> 'bool'
- Whether the type is an integer type.
- identical(self, other) -> 'bool'
- Similar to equals, but checks that object attributes and types are also equal.
Returns
-------
bool
If two Index objects have equal elements and same type True,
otherwise False.
- intersection(self, other, sort=False)
- Form the intersection of two Index objects.
This returns a new Index with elements common to the index and `other`.
Parameters
----------
other : Index or array-like
sort : False or None, default False
Whether to sort the resulting index.
* False : do not sort the result.
* None : sort the result, except when `self` and `other` are equal
or when the values cannot be compared.
Returns
-------
intersection : Index
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.intersection(idx2)
Int64Index([3, 4], dtype='int64')
- is_(self, other) -> 'bool'
- More flexible, faster check like ``is`` but that works through views.
Note: this is *not* the same as ``Index.identical()``, which checks
that metadata is also the same.
Parameters
----------
other : object
Other object to compare against.
Returns
-------
bool
True if both have same underlying data, False otherwise.
See Also
--------
Index.identical : Works like ``Index.is_`` but also checks metadata.
- is_boolean(self) -> 'bool'
- Check if the Index only consists of booleans.
Returns
-------
bool
Whether or not the Index only consists of booleans.
See Also
--------
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([True, False, True])
>>> idx.is_boolean()
True
>>> idx = pd.Index(["True", "False", "True"])
>>> idx.is_boolean()
False
>>> idx = pd.Index([True, False, "True"])
>>> idx.is_boolean()
False
- is_categorical(self) -> 'bool'
- Check if the Index holds categorical data.
Returns
-------
bool
True if the Index is categorical.
See Also
--------
CategoricalIndex : Index for categorical data.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_categorical()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_categorical()
False
>>> s = pd.Series(["Peter", "Victor", "Elisabeth", "Mar"])
>>> s
0 Peter
1 Victor
2 Elisabeth
3 Mar
dtype: object
>>> s.index.is_categorical()
False
- is_floating(self) -> 'bool'
- Check if the Index is a floating type.
The Index may consist of only floats, NaNs, or a mix of floats,
integers, or NaNs.
Returns
-------
bool
Whether or not the Index only consists of only consists of floats, NaNs, or
a mix of floats, integers, or NaNs.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1.0, 2.0, np.nan, 4.0])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4, np.nan])
>>> idx.is_floating()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_floating()
False
- is_integer(self) -> 'bool'
- Check if the Index only consists of integers.
Returns
-------
bool
Whether or not the Index only consists of integers.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_integer()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_integer()
False
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_integer()
False
- is_interval(self) -> 'bool'
- Check if the Index holds Interval objects.
Returns
-------
bool
Whether or not the Index holds Interval objects.
See Also
--------
IntervalIndex : Index for Interval objects.
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([pd.Interval(left=0, right=5),
... pd.Interval(left=5, right=10)])
>>> idx.is_interval()
True
>>> idx = pd.Index([1, 3, 5, 7])
>>> idx.is_interval()
False
- is_mixed(self) -> 'bool'
- Check if the Index holds data with mixed data types.
Returns
-------
bool
Whether or not the Index holds data with mixed data types.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
Examples
--------
>>> idx = pd.Index(['a', np.nan, 'b'])
>>> idx.is_mixed()
True
>>> idx = pd.Index([1.0, 2.0, 3.0, 5.0])
>>> idx.is_mixed()
False
- is_numeric(self) -> 'bool'
- Check if the Index only consists of numeric data.
Returns
-------
bool
Whether or not the Index only consists of numeric data.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_object : Check if the Index is of the object dtype.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan])
>>> idx.is_numeric()
True
>>> idx = pd.Index([1, 2, 3, 4.0, np.nan, "Apple"])
>>> idx.is_numeric()
False
- is_object(self) -> 'bool'
- Check if the Index is of the object dtype.
Returns
-------
bool
Whether or not the Index is of the object dtype.
See Also
--------
is_boolean : Check if the Index only consists of booleans.
is_integer : Check if the Index only consists of integers.
is_floating : Check if the Index is a floating type.
is_numeric : Check if the Index only consists of numeric data.
is_categorical : Check if the Index holds categorical data.
is_interval : Check if the Index holds Interval objects.
is_mixed : Check if the Index holds data with mixed data types.
Examples
--------
>>> idx = pd.Index(["Apple", "Mango", "Watermelon"])
>>> idx.is_object()
True
>>> idx = pd.Index(["Apple", "Mango", 2.0])
>>> idx.is_object()
True
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.is_object()
False
>>> idx = pd.Index([1.0, 2.0, 3.0, 4.0])
>>> idx.is_object()
False
- isin(self, values, level=None) -> 'np.ndarray'
- Return a boolean array where the index values are in `values`.
Compute boolean array of whether each index value is found in the
passed set of values. The length of the returned boolean array matches
the length of the index.
Parameters
----------
values : set or list-like
Sought values.
level : str or int, optional
Name or position of the index level to use (if the index is a
`MultiIndex`).
Returns
-------
np.ndarray[bool]
NumPy array of boolean values.
See Also
--------
Series.isin : Same for Series.
DataFrame.isin : Same method for DataFrames.
Notes
-----
In the case of `MultiIndex` you must either specify `values` as a
list-like object containing tuples that are the same length as the
number of levels, or specify `level`. Otherwise it will raise a
``ValueError``.
If `level` is specified:
- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.
Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')
Check whether each index value in a list of values.
>>> idx.isin([1, 4])
array([ True, False, False])
>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red', 'blue', 'green']],
... names=('number', 'color'))
>>> midx
MultiIndex([(1, 'red'),
(2, 'blue'),
(3, 'green')],
names=['number', 'color'])
Check whether the strings in the 'color' level of the MultiIndex
are in a list of colors.
>>> midx.isin(['red', 'orange', 'yellow'], level='color')
array([ True, False, False])
To check across the levels of a MultiIndex, pass a list of tuples:
>>> midx.isin([(1, 'red'), (3, 'red')])
array([ True, False, False])
For a DatetimeIndex, string values in `values` are converted to
Timestamps.
>>> dates = ['2000-03-11', '2000-03-12', '2000-03-13']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],
dtype='datetime64[ns]', freq=None)
>>> dti.isin(['2000-03-11'])
array([ True, False, False])
- isna(self) -> 'npt.NDArray[np.bool_]'
- Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as ``None``, :attr:`numpy.NaN` or :attr:`pd.NaT`, get
mapped to ``True`` values.
Everything else get mapped to ``False`` values. Characters such as
empty strings `''` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
Returns
-------
numpy.ndarray[bool]
A boolean array of whether my values are NA.
See Also
--------
Index.notna : Boolean inverse of isna.
Index.dropna : Omit entries with missing values.
isna : Top-level isna.
Series.isna : Detect missing values in Series object.
Examples
--------
Show which entries in a pandas.Index are NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.isna()
array([False, False, True])
Empty strings are not considered NA values. None is considered an NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.isna()
array([False, False, False, True])
For datetimes, `NaT` (Not a Time) is considered as an NA value.
>>> idx = pd.DatetimeIndex([pd.Timestamp('1940-04-25'),
... pd.Timestamp(''), None, pd.NaT])
>>> idx
DatetimeIndex(['1940-04-25', 'NaT', 'NaT', 'NaT'],
dtype='datetime64[ns]', freq=None)
>>> idx.isna()
array([False, True, True, True])
- isnull = isna(self) -> 'npt.NDArray[np.bool_]'
- join(self, other, how: 'str_t' = 'left', level=None, return_indexers: 'bool' = False, sort: 'bool' = False)
- Compute join_index and indexers to conform data
structures to the new index.
Parameters
----------
other : Index
how : {'left', 'right', 'inner', 'outer'}
level : int or level name, default None
return_indexers : bool, default False
sort : bool, default False
Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword).
Returns
-------
join_index, (left_indexer, right_indexer)
- max(self, axis=None, skipna=True, *args, **kwargs)
- Return the maximum value of the Index.
Parameters
----------
axis : int, optional
For compatibility with NumPy. Only 0 or None are allowed.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Maximum value.
See Also
--------
Index.min : Return the minimum value in an Index.
Series.max : Return the maximum value in a Series.
DataFrame.max : Return the maximum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.max()
3
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.max()
'c'
For a MultiIndex, the maximum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.max()
('b', 2)
- memory_usage(self, deep: 'bool' = False) -> 'int'
- Memory usage of the values.
Parameters
----------
deep : bool, default False
Introspect the data deeply, interrogate
`object` dtypes for system-level memory consumption.
Returns
-------
bytes used
See Also
--------
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.
Notes
-----
Memory usage does not include memory consumed by elements that
are not components of the array if deep=False or if used on PyPy
- min(self, axis=None, skipna=True, *args, **kwargs)
- Return the minimum value of the Index.
Parameters
----------
axis : {None}
Dummy argument for consistency with Series.
skipna : bool, default True
Exclude NA/null values when showing the result.
*args, **kwargs
Additional arguments and keywords for compatibility with NumPy.
Returns
-------
scalar
Minimum value.
See Also
--------
Index.max : Return the maximum value of the object.
Series.min : Return the minimum value in a Series.
DataFrame.min : Return the minimum values in a DataFrame.
Examples
--------
>>> idx = pd.Index([3, 2, 1])
>>> idx.min()
1
>>> idx = pd.Index(['c', 'b', 'a'])
>>> idx.min()
'a'
For a MultiIndex, the minimum is determined lexicographically.
>>> idx = pd.MultiIndex.from_product([('a', 'b'), (2, 1)])
>>> idx.min()
('a', 1)
- notna(self) -> 'npt.NDArray[np.bool_]'
- Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to ``True``. Characters such as empty
strings ``''`` or :attr:`numpy.inf` are not considered NA values
(unless you set ``pandas.options.mode.use_inf_as_na = True``).
NA values, such as None or :attr:`numpy.NaN`, get mapped to ``False``
values.
Returns
-------
numpy.ndarray[bool]
Boolean array to indicate which entries are not NA.
See Also
--------
Index.notnull : Alias of notna.
Index.isna: Inverse of notna.
notna : Top-level notna.
Examples
--------
Show which entries in an Index are not NA. The result is an
array.
>>> idx = pd.Index([5.2, 6.0, np.NaN])
>>> idx
Float64Index([5.2, 6.0, nan], dtype='float64')
>>> idx.notna()
array([ True, True, False])
Empty strings are not considered NA values. None is considered a NA
value.
>>> idx = pd.Index(['black', '', 'red', None])
>>> idx
Index(['black', '', 'red', None], dtype='object')
>>> idx.notna()
array([ True, True, True, False])
- notnull = notna(self) -> 'npt.NDArray[np.bool_]'
- putmask(self, mask, value) -> 'Index'
- Return a new Index of the values set with the mask.
Returns
-------
Index
See Also
--------
numpy.ndarray.putmask : Changes elements of an array
based on conditional and input values.
- ravel(self, order='C')
- Return an ndarray of the flattened values of the underlying data.
Returns
-------
numpy.ndarray
Flattened array.
See Also
--------
numpy.ndarray.ravel : Return a flattened array.
- reindex(self, target, method=None, level=None, limit=None, tolerance=None) -> 'tuple[Index, npt.NDArray[np.intp] | None]'
- Create index with target's values.
Parameters
----------
target : an iterable
method : {None, 'pad'/'ffill', 'backfill'/'bfill', 'nearest'}, optional
* default: exact matches only.
* pad / ffill: find the PREVIOUS index value if no exact match.
* backfill / bfill: use NEXT index value if no exact match
* nearest: use the NEAREST index value if no exact match. Tied
distances are broken by preferring the larger index value.
level : int, optional
Level of multiindex.
limit : int, optional
Maximum number of consecutive labels in ``target`` to match for
inexact matches.
tolerance : int or float, optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.
Returns
-------
new_index : pd.Index
Resulting index.
indexer : np.ndarray[np.intp] or None
Indices of output values in original index.
Raises
------
TypeError
If ``method`` passed along with ``level``.
ValueError
If non-unique multi-index
ValueError
If non-unique index and ``method`` or ``limit`` passed.
See Also
--------
Series.reindex
DataFrame.reindex
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.reindex(['car', 'bike'])
(Index(['car', 'bike'], dtype='object'), array([0, 1]))
- rename(self, name, inplace=False)
- Alter Index or MultiIndex name.
Able to set new names without level. Defaults to returning new index.
Length of names must match number of levels in MultiIndex.
Parameters
----------
name : label or list of labels
Name(s) to set.
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.set_names : Able to set new names partially and by level.
Examples
--------
>>> idx = pd.Index(['A', 'C', 'A', 'B'], name='score')
>>> idx.rename('grade')
Index(['A', 'C', 'A', 'B'], dtype='object', name='grade')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]],
... names=['kind', 'year'])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.rename(['species', 'year'])
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
>>> idx.rename('species')
Traceback (most recent call last):
TypeError: Must pass list-like as `names`.
- repeat(self, repeats, axis=None)
- Repeat elements of a Index.
Returns a new Index where each element of the current Index
is repeated consecutively a given number of times.
Parameters
----------
repeats : int or array of ints
The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Index.
axis : None
Must be ``None``. Has no effect but is accepted for compatibility
with numpy.
Returns
-------
repeated_index : Index
Newly created Index with repeated elements.
See Also
--------
Series.repeat : Equivalent function for Series.
numpy.repeat : Similar method for :class:`numpy.ndarray`.
Examples
--------
>>> idx = pd.Index(['a', 'b', 'c'])
>>> idx
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2)
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3])
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
- set_names(self, names, level=None, inplace: 'bool' = False)
- Set Index or MultiIndex name.
Able to set new names partially and by level.
Parameters
----------
names : label or list of label or dict-like for MultiIndex
Name(s) to set.
.. versionchanged:: 1.3.0
level : int, label or list of int or label, optional
If the index is a MultiIndex and names is not dict-like, level(s) to set
(None for all levels). Otherwise level must be None.
.. versionchanged:: 1.3.0
inplace : bool, default False
Modifies the object directly, instead of creating a new Index or
MultiIndex.
Returns
-------
Index or None
The same type as the caller or None if ``inplace=True``.
See Also
--------
Index.rename : Able to set new names without level.
Examples
--------
>>> idx = pd.Index([1, 2, 3, 4])
>>> idx
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter')
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
>>> idx = pd.MultiIndex.from_product([['python', 'cobra'],
... [2018, 2019]])
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
)
>>> idx.set_names(['kind', 'year'], inplace=True)
>>> idx
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.set_names('species', level=0)
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
When renaming levels with a dict, levels can not be passed.
>>> idx.set_names({'kind': 'snake'})
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['snake', 'year'])
- set_value(self, arr, key, value)
- Fast lookup of value from 1-dimensional ndarray.
.. deprecated:: 1.0
Notes
-----
Only use this if you know what you're doing.
- slice_indexer(self, start: 'Hashable | None' = None, end: 'Hashable | None' = None, step: 'int | None' = None, kind=<no_default>) -> 'slice'
- Compute the slice indexer for input labels and step.
Index needs to be ordered and unique.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, default None
kind : str, default None
.. deprecated:: 1.4.0
Returns
-------
indexer : slice
Raises
------
KeyError : If key does not exist, or key is not unique and index is
not ordered.
Notes
-----
This function assumes that the data is sorted, so use at your own peril
Examples
--------
This is a method on all index types. For example you can do:
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_indexer(start='b', end='c')
slice(1, 3, None)
>>> idx = pd.MultiIndex.from_arrays([list('abcd'), list('efgh')])
>>> idx.slice_indexer(start='b', end=('c', 'g'))
slice(1, 3, None)
- slice_locs(self, start=None, end=None, step=None, kind=<no_default>) -> 'tuple[int, int]'
- Compute slice locations for input labels.
Parameters
----------
start : label, default None
If None, defaults to the beginning.
end : label, default None
If None, defaults to the end.
step : int, defaults None
If None, defaults to 1.
kind : {'loc', 'getitem'} or None
.. deprecated:: 1.4.0
Returns
-------
start, end : int
See Also
--------
Index.get_loc : Get location for a single label.
Notes
-----
This method only works if the index is monotonic or unique.
Examples
--------
>>> idx = pd.Index(list('abcd'))
>>> idx.slice_locs(start='b', end='c')
(1, 3)
- sort(self, *args, **kwargs)
- Use sort_values instead.
- sort_values(self, return_indexer: 'bool' = False, ascending: 'bool' = True, na_position: 'str_t' = 'last', key: 'Callable | None' = None)
- Return a sorted copy of the index.
Return a sorted copy of the index, and optionally return the indices
that sorted the index itself.
Parameters
----------
return_indexer : bool, default False
Should the indices that would sort the index be returned.
ascending : bool, default True
Should the index values be sorted in an ascending order.
na_position : {'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at
the end.
.. versionadded:: 1.2.0
key : callable, optional
If not None, apply the key function to the index values
before sorting. This is similar to the `key` argument in the
builtin :meth:`sorted` function, with the notable difference that
this `key` function should be *vectorized*. It should expect an
``Index`` and return an ``Index`` of the same shape.
.. versionadded:: 1.1.0
Returns
-------
sorted_index : pandas.Index
Sorted copy of the index.
indexer : numpy.ndarray, optional
The indices that the index itself was sorted by.
See Also
--------
Series.sort_values : Sort values of a Series.
DataFrame.sort_values : Sort values in a DataFrame.
Examples
--------
>>> idx = pd.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order, and also get the indices `idx` was
sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True)
(Int64Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
- sortlevel(self, level=None, ascending=True, sort_remaining=None)
- For internal compatibility with the Index API.
Sort the Index. This is for compat with MultiIndex
Parameters
----------
ascending : bool, default True
False to sort in descending order
level, sort_remaining are compat parameters
Returns
-------
Index
- symmetric_difference(self, other, result_name=None, sort=None)
- Compute the symmetric difference of two Index objects.
Parameters
----------
other : Index or array-like
result_name : str
sort : False or None, default None
Whether to sort the resulting index. By default, the
values are attempted to be sorted, but any TypeError from
incomparable elements is caught by pandas.
* None : Attempt to sort the result, but catch any TypeErrors
from comparing incomparable elements.
* False : Do not sort the result.
Returns
-------
symmetric_difference : Index
Notes
-----
``symmetric_difference`` contains elements that appear in either
``idx1`` or ``idx2`` but not both. Equivalent to the Index created by
``idx1.difference(idx2) | idx2.difference(idx1)`` with duplicates
dropped.
Examples
--------
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([2, 3, 4, 5])
>>> idx1.symmetric_difference(idx2)
Int64Index([1, 5], dtype='int64')
- to_flat_index(self)
- Identity method.
This is implemented for compatibility with subclass implementations
when chaining.
Returns
-------
pd.Index
Caller.
See Also
--------
MultiIndex.to_flat_index : Subclass implementation.
- to_frame(self, index: 'bool' = True, name: 'Hashable' = <no_default>) -> 'DataFrame'
- Create a DataFrame with a column containing the Index.
Parameters
----------
index : bool, default True
Set the index of the returned DataFrame as the original Index.
name : object, default None
The passed name should substitute for the index name (if it has
one).
Returns
-------
DataFrame
DataFrame containing the original Index data.
See Also
--------
Index.to_series : Convert an Index to a Series.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame()
animal
animal
Ant Ant
Bear Bear
Cow Cow
By default, the original Index is reused. To enforce a new Index:
>>> idx.to_frame(index=False)
animal
0 Ant
1 Bear
2 Cow
To override the name of the resulting column, specify `name`:
>>> idx.to_frame(index=False, name='zoo')
zoo
0 Ant
1 Bear
2 Cow
- to_native_types(self, slicer=None, **kwargs) -> 'np.ndarray'
- Format specified values of `self` and return them.
.. deprecated:: 1.2.0
Parameters
----------
slicer : int, array-like
An indexer into `self` that specifies which values
are used in the formatting process.
kwargs : dict
Options for specifying how the values should be formatted.
These options include the following:
1) na_rep : str
The value that serves as a placeholder for NULL values
2) quoting : bool or None
Whether or not there are quoted values in `self`
3) date_format : str
The format used to represent date-like values.
Returns
-------
numpy.ndarray
Formatted values.
- to_series(self, index=None, name: 'Hashable' = None) -> 'Series'
- Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.
Parameters
----------
index : Index, optional
Index of resulting Series. If None, defaults to original index.
name : str, optional
Name of resulting Series. If None, defaults to name of original
index.
Returns
-------
Series
The dtype will be based on the type of the Index values.
See Also
--------
Index.to_frame : Convert an Index to a DataFrame.
Series.to_frame : Convert Series to DataFrame.
Examples
--------
>>> idx = pd.Index(['Ant', 'Bear', 'Cow'], name='animal')
By default, the original Index and original name is reused.
>>> idx.to_series()
animal
Ant Ant
Bear Bear
Cow Cow
Name: animal, dtype: object
To enforce a new Index, specify new labels to ``index``:
>>> idx.to_series(index=[0, 1, 2])
0 Ant
1 Bear
2 Cow
Name: animal, dtype: object
To override the name of the resulting column, specify `name`:
>>> idx.to_series(name='zoo')
animal
Ant Ant
Bear Bear
Cow Cow
Name: zoo, dtype: object
- union(self, other, sort=None)
- Form the union of two Index objects.
If the Index objects are incompatible, both Index objects will be
cast to dtype('object') first.
.. versionchanged:: 0.25.0
Parameters
----------
other : Index or array-like
sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.
* False : do not sort the result.
Returns
-------
union : Index
Examples
--------
Union matching dtypes
>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.union(idx2)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union mismatched dtypes
>>> idx1 = pd.Index(['a', 'b', 'c', 'd'])
>>> idx2 = pd.Index([1, 2, 3, 4])
>>> idx1.union(idx2)
Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
MultiIndex case
>>> idx1 = pd.MultiIndex.from_arrays(
... [[1, 1, 2, 2], ["Red", "Blue", "Red", "Blue"]]
... )
>>> idx1
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue')],
)
>>> idx2 = pd.MultiIndex.from_arrays(
... [[3, 3, 2, 2], ["Red", "Green", "Red", "Green"]]
... )
>>> idx2
MultiIndex([(3, 'Red'),
(3, 'Green'),
(2, 'Red'),
(2, 'Green')],
)
>>> idx1.union(idx2)
MultiIndex([(1, 'Blue'),
(1, 'Red'),
(2, 'Blue'),
(2, 'Green'),
(2, 'Red'),
(3, 'Green'),
(3, 'Red')],
)
>>> idx1.union(idx2, sort=False)
MultiIndex([(1, 'Red'),
(1, 'Blue'),
(2, 'Red'),
(2, 'Blue'),
(3, 'Red'),
(3, 'Green'),
(2, 'Green')],
)
- unique(self: '_IndexT', level: 'Hashable | None' = None) -> '_IndexT'
- Return unique values in the index.
Unique values are returned in order of appearance, this does NOT sort.
Parameters
----------
level : int or hashable, optional
Only return values from specified level (for MultiIndex).
If int, gets the level by integer position, else by level name.
Returns
-------
Index
See Also
--------
unique : Numpy array of unique values in that column.
Series.unique : Return unique values of Series object.
- view(self, cls=None)
- where(self, cond, other=None) -> 'Index'
- Replace values where the condition is False.
The replacement is taken from other.
Parameters
----------
cond : bool array-like with the same length as self
Condition to select the values on.
other : scalar, or array-like, default None
Replacement if the condition is False.
Returns
-------
pandas.Index
A copy of self with values replaced from other
where the condition is False.
See Also
--------
Series.where : Same method for Series.
DataFrame.where : Same method for DataFrame.
Examples
--------
>>> idx = pd.Index(['car', 'bike', 'train', 'tractor'])
>>> idx
Index(['car', 'bike', 'train', 'tractor'], dtype='object')
>>> idx.where(idx.isin(['car', 'train']), 'other')
Index(['car', 'other', 'train', 'other'], dtype='object')
Readonly properties inherited from pandas.core.indexes.base.Index:
- has_duplicates
- Check if the Index has duplicate values.
Returns
-------
bool
Whether or not the Index has duplicate values.
Examples
--------
>>> idx = pd.Index([1, 5, 7, 7])
>>> idx.has_duplicates
True
>>> idx = pd.Index([1, 5, 7])
>>> idx.has_duplicates
False
>>> idx = pd.Index(["Watermelon", "Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
True
>>> idx = pd.Index(["Orange", "Apple",
... "Watermelon"]).astype("category")
>>> idx.has_duplicates
False
- is_monotonic
- Alias for is_monotonic_increasing.
- is_monotonic_decreasing
- Return if the index is monotonic decreasing (only equal or
decreasing) values.
Examples
--------
>>> Index([3, 2, 1]).is_monotonic_decreasing
True
>>> Index([3, 2, 2]).is_monotonic_decreasing
True
>>> Index([3, 1, 2]).is_monotonic_decreasing
False
- is_monotonic_increasing
- Return if the index is monotonic increasing (only equal or
increasing) values.
Examples
--------
>>> Index([1, 2, 3]).is_monotonic_increasing
True
>>> Index([1, 2, 2]).is_monotonic_increasing
True
>>> Index([1, 3, 2]).is_monotonic_increasing
False
- nlevels
- Number of levels.
- shape
- Return a tuple of the shape of the underlying data.
Data descriptors inherited from pandas.core.indexes.base.Index:
- array
- The ExtensionArray of the data backing this Series or Index.
Returns
-------
ExtensionArray
An ExtensionArray of the values stored within. For extension
types, this is the actual array. For NumPy native types, this
is a thin (no copy) wrapper around :class:`numpy.ndarray`.
``.array`` differs ``.values`` which may require converting the
data to a different form.
See Also
--------
Index.to_numpy : Similar method that always returns a NumPy array.
Series.to_numpy : Similar method that always returns a NumPy array.
Notes
-----
This table lays out the different array types for each extension
dtype within pandas.
================== =============================
dtype array type
================== =============================
category Categorical
period PeriodArray
interval IntervalArray
IntegerNA IntegerArray
string StringArray
boolean BooleanArray
datetime64[ns, tz] DatetimeArray
================== =============================
For any 3rd-party extension types, the array type will be an
ExtensionArray.
For all remaining dtypes ``.array`` will be a
:class:`arrays.NumpyExtensionArray` wrapping the actual ndarray
stored within. If you absolutely need a NumPy array (possibly with
copying / coercing data), then use :meth:`Series.to_numpy` instead.
Examples
--------
For regular NumPy types like int, and float, a PandasArray
is returned.
>>> pd.Series([1, 2, 3]).array
<PandasArray>
[1, 2, 3]
Length: 3, dtype: int64
For extension types, like Categorical, the actual ExtensionArray
is returned
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.array
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
- dtype
- Return the dtype object of the underlying data.
- is_all_dates
- Whether or not the index values only consist of dates.
- is_unique
- Return if the index has unique values.
- name
- Return Index or MultiIndex name.
- names
Data and other attributes inherited from pandas.core.indexes.base.Index:
- str = <class 'pandas.core.strings.accessor.StringMethods'>
- Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python's string methods, with some inspiration from
R's stringr package.
Examples
--------
>>> s = pd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: object
>>> s.str.split("_")
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "")
0 AStrSeries
dtype: object
Methods inherited from pandas.core.base.IndexOpsMixin:
- __iter__(self)
- Return an iterator of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
iterator
- factorize(self, sort: 'bool' = False, na_sentinel: 'int | None' = -1)
- Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. `factorize`
is available as both a top-level function :func:`pandas.factorize`,
and as a method :meth:`Series.factorize` and :meth:`Index.factorize`.
Parameters
----------
sort : bool, default False
Sort `uniques` and shuffle `codes` to maintain the
relationship.
na_sentinel : int or None, default -1
Value to mark "not found". If None, will not drop the NaN
from the uniques of the values.
.. versionchanged:: 1.1.2
Returns
-------
codes : ndarray
An integer ndarray that's an indexer into `uniques`.
``uniques.take(codes)`` will have the same values as `values`.
uniques : ndarray, Index, or Categorical
The unique valid values. When `values` is Categorical, `uniques`
is a Categorical. When `values` is some other pandas object, an
`Index` is returned. Otherwise, a 1-D ndarray is returned.
.. note::
Even if there's a missing value in `values`, `uniques` will
*not* contain an entry for it.
See Also
--------
cut : Discretize continuous-valued array.
unique : Find the unique value in an array.
Notes
-----
Reference :ref:`the user guide <reshaping.factorize>` for more examples.
Examples
--------
These examples all show factorize as a top-level method like
``pd.factorize(values)``. The results are identical for methods like
:meth:`Series.factorize`.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
>>> codes
array([0, 0, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
With ``sort=True``, the `uniques` will be sorted, and `codes` will be
shuffled so that the relationship is the maintained.
>>> codes, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
>>> codes
array([1, 1, 0, 2, 1]...)
>>> uniques
array(['a', 'b', 'c'], dtype=object)
Missing values are indicated in `codes` with `na_sentinel`
(``-1`` by default). Note that missing values are never
included in `uniques`.
>>> codes, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
>>> codes
array([ 0, -1, 1, 2, 0]...)
>>> uniques
array(['b', 'a', 'c'], dtype=object)
Thus far, we've only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of `uniques`
will differ. For Categoricals, a `Categorical` is returned.
>>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
Notice that ``'b'`` is in ``uniques.categories``, despite not being
present in ``cat.values``.
For all other pandas objects, an Index of the appropriate type is
returned.
>>> cat = pd.Series(['a', 'a', 'c'])
>>> codes, uniques = pd.factorize(cat)
>>> codes
array([0, 0, 1]...)
>>> uniques
Index(['a', 'c'], dtype='object')
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting ``na_sentinel=None``.
>>> values = np.array([1, 2, 1, np.nan])
>>> codes, uniques = pd.factorize(values) # default: na_sentinel=-1
>>> codes
array([ 0, 1, 0, -1])
>>> uniques
array([1., 2.])
>>> codes, uniques = pd.factorize(values, na_sentinel=None)
>>> codes
array([0, 1, 0, 2])
>>> uniques
array([ 1., 2., nan])
- item(self)
- Return the first element of the underlying data as a Python scalar.
Returns
-------
scalar
The first element of %(klass)s.
Raises
------
ValueError
If the data is not length-1.
- nunique(self, dropna: 'bool' = True) -> 'int'
- Return number of unique elements in the object.
Excludes NA values by default.
Parameters
----------
dropna : bool, default True
Don't include NaN in the count.
Returns
-------
int
See Also
--------
DataFrame.nunique: Method nunique for DataFrame.
Series.count: Count non-NA/null observations in the Series.
Examples
--------
>>> s = pd.Series([1, 3, 5, 7, 7])
>>> s
0 1
1 3
2 5
3 7
4 7
dtype: int64
>>> s.nunique()
4
- searchsorted(self, value: 'NumpyValueArrayLike | ExtensionArray', side: "Literal[('left', 'right')]" = 'left', sorter: 'NumpySorter' = None) -> 'npt.NDArray[np.intp] | np.intp'
- Find indices where elements should be inserted to maintain order.
Find the indices into a sorted Index `self` such that, if the
corresponding elements in `value` were inserted before the indices,
the order of `self` would be preserved.
.. note::
The Index *must* be monotonically sorted, otherwise
wrong locations will likely be returned. Pandas does *not*
check this for you.
Parameters
----------
value : array-like or scalar
Values to insert into `self`.
side : {'left', 'right'}, optional
If 'left', the index of the first suitable location found is given.
If 'right', return the last such index. If there is no suitable
index, return either 0 or N (where N is the length of `self`).
sorter : 1-D array-like, optional
Optional array of integer indices that sort `self` into ascending
order. They are typically the result of ``np.argsort``.
Returns
-------
int or array of int
A scalar or array of insertion points with the
same shape as `value`.
See Also
--------
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.
Notes
-----
Binary search is used to find the required insertion points.
Examples
--------
>>> ser = pd.Series([1, 2, 3])
>>> ser
0 1
1 2
2 3
dtype: int64
>>> ser.searchsorted(4)
3
>>> ser.searchsorted([0, 4])
array([0, 3])
>>> ser.searchsorted([1, 3], side='left')
array([0, 2])
>>> ser.searchsorted([1, 3], side='right')
array([1, 3])
>>> ser = pd.Series(pd.to_datetime(['3/11/2000', '3/12/2000', '3/13/2000']))
>>> ser
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
>>> ser.searchsorted('3/14/2000')
3
>>> ser = pd.Categorical(
... ['apple', 'bread', 'bread', 'cheese', 'milk'], ordered=True
... )
>>> ser
['apple', 'bread', 'bread', 'cheese', 'milk']
Categories (4, object): ['apple' < 'bread' < 'cheese' < 'milk']
>>> ser.searchsorted('bread')
1
>>> ser.searchsorted(['bread'], side='right')
array([3])
If the values are not monotonically sorted, wrong locations
may be returned:
>>> ser = pd.Series([2, 1, 3])
>>> ser
0 2
1 1
2 3
dtype: int64
>>> ser.searchsorted(1) # doctest: +SKIP
0 # wrong result, correct would be 1
- to_list = tolist(self)
- to_numpy(self, dtype: 'npt.DTypeLike | None' = None, copy: 'bool' = False, na_value=<no_default>, **kwargs) -> 'np.ndarray'
- A NumPy ndarray representing the values in this Series or Index.
Parameters
----------
dtype : str or numpy.dtype, optional
The dtype to pass to :meth:`numpy.asarray`.
copy : bool, default False
Whether to ensure that the returned value is not a view on
another array. Note that ``copy=False`` does not *ensure* that
``to_numpy()`` is no-copy. Rather, ``copy=True`` ensure that
a copy is made, even if not strictly necessary.
na_value : Any, optional
The value to use for missing values. The default value depends
on `dtype` and the type of the array.
.. versionadded:: 1.0.0
**kwargs
Additional keywords passed through to the ``to_numpy`` method
of the underlying array (for extension arrays).
.. versionadded:: 1.0.0
Returns
-------
numpy.ndarray
See Also
--------
Series.array : Get the actual data stored within.
Index.array : Get the actual data stored within.
DataFrame.to_numpy : Similar method for DataFrame.
Notes
-----
The returned array will be the same up to equality (values equal
in `self` will be equal in the returned array; likewise for values
that are not equal). When `self` contains an ExtensionArray, the
dtype may be different. For example, for a category-dtype Series,
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.
For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
we recommend doing that).
For extension types, ``to_numpy()`` *may* require copying data and
coercing the result to a NumPy type (possibly object), which may be
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.
================== ================================
dtype array type
================== ================================
category[T] ndarray[T] (same dtype as input)
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================
Examples
--------
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
>>> ser.to_numpy()
array(['a', 'b', 'a'], dtype=object)
Specify the `dtype` to control how datetime-aware data is represented.
Use ``dtype=object`` to return an ndarray of pandas :class:`Timestamp`
objects, each with the correct ``tz``.
>>> ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
>>> ser.to_numpy(dtype=object)
array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
Timestamp('2000-01-02 00:00:00+0100', tz='CET')],
dtype=object)
Or ``dtype='datetime64[ns]'`` to return an ndarray of native
datetime64 values. The values are converted to UTC and the timezone
info is dropped.
>>> ser.to_numpy(dtype="datetime64[ns]")
... # doctest: +ELLIPSIS
array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00...'],
dtype='datetime64[ns]')
- tolist(self)
- Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
Returns
-------
list
See Also
--------
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
- transpose(self: '_T', *args, **kwargs) -> '_T'
- Return the transpose, which is by definition self.
Returns
-------
%(klass)s
- value_counts(self, normalize: 'bool' = False, sort: 'bool' = True, ascending: 'bool' = False, bins=None, dropna: 'bool' = True)
- Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
Parameters
----------
normalize : bool, default False
If True then the object returned will contain the relative
frequencies of the unique values.
sort : bool, default True
Sort by frequencies.
ascending : bool, default False
Sort in ascending order.
bins : int, optional
Rather than count values, group them into half-open bins,
a convenience for ``pd.cut``, only works with numeric data.
dropna : bool, default True
Don't include counts of NaN.
Returns
-------
Series
See Also
--------
Series.count: Number of non-NA elements in a Series.
DataFrame.count: Number of non-NA elements in a DataFrame.
DataFrame.value_counts: Equivalent method on DataFrames.
Examples
--------
>>> index = pd.Index([3, 1, 2, 3, 4, np.nan])
>>> index.value_counts()
3.0 2
1.0 1
2.0 1
4.0 1
dtype: int64
With `normalize` set to `True`, returns the relative frequency by
dividing all values by the sum of values.
>>> s = pd.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True)
3.0 0.4
1.0 0.2
2.0 0.2
4.0 0.2
dtype: float64
**bins**
Bins can be useful for going from a continuous variable to a
categorical variable; instead of counting unique
apparitions of values, divide the index in the specified
number of half-open bins.
>>> s.value_counts(bins=3)
(0.996, 2.0] 2
(2.0, 3.0] 2
(3.0, 4.0] 1
dtype: int64
**dropna**
With `dropna` set to `False` we can also see NaN index values.
>>> s.value_counts(dropna=False)
3.0 2
1.0 1
2.0 1
4.0 1
NaN 1
dtype: int64
Readonly properties inherited from pandas.core.base.IndexOpsMixin:
- T
- Return the transpose, which is by definition self.
- empty
- nbytes
- Return the number of bytes in the underlying data.
- ndim
- Number of dimensions of the underlying data, by definition 1.
- size
- Return the number of elements in the underlying data.
Data and other attributes inherited from pandas.core.base.IndexOpsMixin:
- __array_priority__ = 1000
Methods inherited from pandas.core.arraylike.OpsMixin:
- __add__(self, other)
- __divmod__(self, other)
- __eq__(self, other)
- Return self==value.
- __floordiv__(self, other)
- __ge__(self, other)
- Return self>=value.
- __gt__(self, other)
- Return self>value.
- __le__(self, other)
- Return self<=value.
- __lt__(self, other)
- Return self<value.
- __mod__(self, other)
- __mul__(self, other)
- __ne__(self, other)
- Return self!=value.
- __pow__(self, other)
- __radd__(self, other)
- __rand__(self, other)
- __rdivmod__(self, other)
- __rfloordiv__(self, other)
- __rmod__(self, other)
- __rmul__(self, other)
- __ror__(self, other)
- __rpow__(self, other)
- __rsub__(self, other)
- __rtruediv__(self, other)
- __rxor__(self, other)
- __sub__(self, other)
- __truediv__(self, other)
Data descriptors inherited from pandas.core.arraylike.OpsMixin:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
Data and other attributes inherited from pandas.core.arraylike.OpsMixin:
- __hash__ = None
Methods inherited from pandas.core.base.PandasObject:
- __sizeof__(self) -> 'int'
- Generates the total memory usage for an object that returns
either a value or Series of values
Methods inherited from pandas.core.accessor.DirNamesMixin:
- __dir__(self) -> 'list[str]'
- Provide method name lookup and completion.
Notes
-----
Only provide 'public' methods.
|
class Timestamp(_Timestamp) |
| |
Timestamp(ts_input=<object object at 0x000002042F5718D0>, freq=None, tz=None, unit=None, year=None, month=None, day=None, hour=None, minute=None, second=None, microsecond=None, nanosecond=None, tzinfo=None, *, fold=None)
Pandas replacement for python datetime.datetime object.
Timestamp is the pandas equivalent of python's Datetime
and is interchangeable with it in most cases. It's the type used
for the entries that make up a DatetimeIndex, and other timeseries
oriented data structures in pandas.
Parameters
----------
ts_input : datetime-like, str, int, float
Value to be converted to Timestamp.
freq : str, DateOffset
Offset which Timestamp will have.
tz : str, pytz.timezone, dateutil.tz.tzfile or None
Time zone for time which Timestamp will have.
unit : str
Unit used for conversion if ts_input is of type int or float. The
valid values are 'D', 'h', 'm', 's', 'ms', 'us', and 'ns'. For
example, 's' means seconds and 'ms' means milliseconds.
year, month, day : int
hour, minute, second, microsecond : int, optional, default 0
nanosecond : int, optional, default 0
tzinfo : datetime.tzinfo, optional, default None
fold : {0, 1}, default None, keyword-only
Due to daylight saving time, one wall clock time can occur twice
when shifting from summer to winter time; fold describes whether the
datetime-like corresponds to the first (0) or the second time (1)
the wall clock hits the ambiguous time.
.. versionadded:: 1.1.0
Notes
-----
There are essentially three calling conventions for the constructor. The
primary form accepts four parameters. They can be passed by position or
keyword.
The other two forms mimic the parameters from ``datetime.datetime``. They
can be passed by either position or keyword, but not both mixed together.
Examples
--------
Using the primary calling convention:
This converts a datetime-like string
>>> pd.Timestamp('2017-01-01T12')
Timestamp('2017-01-01 12:00:00')
This converts a float representing a Unix epoch in units of seconds
>>> pd.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')
This converts an int representing a Unix-epoch in units of seconds
and for a particular timezone
>>> pd.Timestamp(1513393355, unit='s', tz='US/Pacific')
Timestamp('2017-12-15 19:02:35-0800', tz='US/Pacific')
Using the other two forms that mimic the API for ``datetime.datetime``:
>>> pd.Timestamp(2017, 1, 1, 12)
Timestamp('2017-01-01 12:00:00')
>>> pd.Timestamp(year=2017, month=1, day=1, hour=12)
Timestamp('2017-01-01 12:00:00') |
| |
- Method resolution order:
- Timestamp
- _Timestamp
- pandas._libs.tslibs.base.ABCTimestamp
- datetime.datetime
- datetime.date
- builtins.object
Methods defined here:
- __new__(cls, ts_input=<object object at 0x000002042F5718D0>, freq=None, tz=None, unit=None, year=None, month=None, day=None, hour=None, minute=None, second=None, microsecond=None, nanosecond=None, tzinfo=None, *, fold=None)
- astimezone = tz_convert(self, tz)
- ceil(self, freq, ambiguous='raise', nonexistent='raise')
- Return a new Timestamp ceiled to this resolution.
Parameters
----------
freq : str
Frequency string indicating the ceiling resolution.
ambiguous : bool or {'raise', 'NaT'}, default 'raise'
The behavior is as follows:
* bool contains flags to determine if time is dst or not (note
that this flag is only applicable for ambiguous fall dst dates).
* 'NaT' will return NaT for an ambiguous time.
* 'raise' will raise an AmbiguousTimeError for an ambiguous time.
nonexistent : {'raise', 'shift_forward', 'shift_backward, 'NaT', timedelta}, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
* 'shift_forward' will shift the nonexistent time forward to the
closest existing time.
* 'shift_backward' will shift the nonexistent time backward to the
closest existing time.
* 'NaT' will return NaT where there are nonexistent times.
* timedelta objects will shift nonexistent times by the timedelta.
* 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Raises
------
ValueError if the freq cannot be converted.
Notes
-----
If the Timestamp has a timezone, ceiling will take place relative to the
local ("wall") time and re-localized to the same timezone. When ceiling
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
Create a timestamp object:
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
A timestamp can be ceiled using multiple frequency units:
>>> ts.ceil(freq='H') # hour
Timestamp('2020-03-14 16:00:00')
>>> ts.ceil(freq='T') # minute
Timestamp('2020-03-14 15:33:00')
>>> ts.ceil(freq='S') # seconds
Timestamp('2020-03-14 15:32:53')
>>> ts.ceil(freq='U') # microseconds
Timestamp('2020-03-14 15:32:52.192549')
``freq`` can also be a multiple of a single unit, like '5T' (i.e. 5 minutes):
>>> ts.ceil(freq='5T')
Timestamp('2020-03-14 15:35:00')
or a combination of multiple units, like '1H30T' (i.e. 1 hour and 30 minutes):
>>> ts.ceil(freq='1H30T')
Timestamp('2020-03-14 16:30:00')
Analogous for ``pd.NaT``:
>>> pd.NaT.ceil()
NaT
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> ts_tz = pd.Timestamp("2021-10-31 01:30:00").tz_localize("Europe/Amsterdam")
>>> ts_tz.ceil("H", ambiguous=False)
Timestamp('2021-10-31 02:00:00+0100', tz='Europe/Amsterdam')
>>> ts_tz.ceil("H", ambiguous=True)
Timestamp('2021-10-31 02:00:00+0200', tz='Europe/Amsterdam')
- floor(self, freq, ambiguous='raise', nonexistent='raise')
- Return a new Timestamp floored to this resolution.
Parameters
----------
freq : str
Frequency string indicating the flooring resolution.
ambiguous : bool or {'raise', 'NaT'}, default 'raise'
The behavior is as follows:
* bool contains flags to determine if time is dst or not (note
that this flag is only applicable for ambiguous fall dst dates).
* 'NaT' will return NaT for an ambiguous time.
* 'raise' will raise an AmbiguousTimeError for an ambiguous time.
nonexistent : {'raise', 'shift_forward', 'shift_backward, 'NaT', timedelta}, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
* 'shift_forward' will shift the nonexistent time forward to the
closest existing time.
* 'shift_backward' will shift the nonexistent time backward to the
closest existing time.
* 'NaT' will return NaT where there are nonexistent times.
* timedelta objects will shift nonexistent times by the timedelta.
* 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Raises
------
ValueError if the freq cannot be converted.
Notes
-----
If the Timestamp has a timezone, flooring will take place relative to the
local ("wall") time and re-localized to the same timezone. When flooring
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
Create a timestamp object:
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
A timestamp can be floored using multiple frequency units:
>>> ts.floor(freq='H') # hour
Timestamp('2020-03-14 15:00:00')
>>> ts.floor(freq='T') # minute
Timestamp('2020-03-14 15:32:00')
>>> ts.floor(freq='S') # seconds
Timestamp('2020-03-14 15:32:52')
>>> ts.floor(freq='N') # nanoseconds
Timestamp('2020-03-14 15:32:52.192548651')
``freq`` can also be a multiple of a single unit, like '5T' (i.e. 5 minutes):
>>> ts.floor(freq='5T')
Timestamp('2020-03-14 15:30:00')
or a combination of multiple units, like '1H30T' (i.e. 1 hour and 30 minutes):
>>> ts.floor(freq='1H30T')
Timestamp('2020-03-14 15:00:00')
Analogous for ``pd.NaT``:
>>> pd.NaT.floor()
NaT
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> ts_tz = pd.Timestamp("2021-10-31 03:30:00").tz_localize("Europe/Amsterdam")
>>> ts_tz.floor("2H", ambiguous=False)
Timestamp('2021-10-31 02:00:00+0100', tz='Europe/Amsterdam')
>>> ts_tz.floor("2H", ambiguous=True)
Timestamp('2021-10-31 02:00:00+0200', tz='Europe/Amsterdam')
- isoweekday(self)
- Return the day of the week represented by the date.
Monday == 1 ... Sunday == 7.
- replace(self, year=None, month=None, day=None, hour=None, minute=None, second=None, microsecond=None, nanosecond=None, tzinfo=<class 'object'>, fold=None)
- Implements datetime.replace, handles nanoseconds.
Parameters
----------
year : int, optional
month : int, optional
day : int, optional
hour : int, optional
minute : int, optional
second : int, optional
microsecond : int, optional
nanosecond : int, optional
tzinfo : tz-convertible, optional
fold : int, optional
Returns
-------
Timestamp with fields replaced
Examples
--------
Create a timestamp object:
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651', tz='UTC')
>>> ts
Timestamp('2020-03-14 15:32:52.192548651+0000', tz='UTC')
Replace year and the hour:
>>> ts.replace(year=1999, hour=10)
Timestamp('1999-03-14 10:32:52.192548651+0000', tz='UTC')
Replace timezone (not a conversion):
>>> import pytz
>>> ts.replace(tzinfo=pytz.timezone('US/Pacific'))
Timestamp('2020-03-14 15:32:52.192548651-0700', tz='US/Pacific')
Analogous for ``pd.NaT``:
>>> pd.NaT.replace(tzinfo=pytz.timezone('US/Pacific'))
NaT
- round(self, freq, ambiguous='raise', nonexistent='raise')
- Round the Timestamp to the specified resolution.
Parameters
----------
freq : str
Frequency string indicating the rounding resolution.
ambiguous : bool or {'raise', 'NaT'}, default 'raise'
The behavior is as follows:
* bool contains flags to determine if time is dst or not (note
that this flag is only applicable for ambiguous fall dst dates).
* 'NaT' will return NaT for an ambiguous time.
* 'raise' will raise an AmbiguousTimeError for an ambiguous time.
nonexistent : {'raise', 'shift_forward', 'shift_backward, 'NaT', timedelta}, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
* 'shift_forward' will shift the nonexistent time forward to the
closest existing time.
* 'shift_backward' will shift the nonexistent time backward to the
closest existing time.
* 'NaT' will return NaT where there are nonexistent times.
* timedelta objects will shift nonexistent times by the timedelta.
* 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
a new Timestamp rounded to the given resolution of `freq`
Raises
------
ValueError if the freq cannot be converted
Notes
-----
If the Timestamp has a timezone, rounding will take place relative to the
local ("wall") time and re-localized to the same timezone. When rounding
near daylight savings time, use ``nonexistent`` and ``ambiguous`` to
control the re-localization behavior.
Examples
--------
Create a timestamp object:
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
A timestamp can be rounded using multiple frequency units:
>>> ts.round(freq='H') # hour
Timestamp('2020-03-14 16:00:00')
>>> ts.round(freq='T') # minute
Timestamp('2020-03-14 15:33:00')
>>> ts.round(freq='S') # seconds
Timestamp('2020-03-14 15:32:52')
>>> ts.round(freq='L') # milliseconds
Timestamp('2020-03-14 15:32:52.193000')
``freq`` can also be a multiple of a single unit, like '5T' (i.e. 5 minutes):
>>> ts.round(freq='5T')
Timestamp('2020-03-14 15:35:00')
or a combination of multiple units, like '1H30T' (i.e. 1 hour and 30 minutes):
>>> ts.round(freq='1H30T')
Timestamp('2020-03-14 15:00:00')
Analogous for ``pd.NaT``:
>>> pd.NaT.round()
NaT
When rounding near a daylight savings time transition, use ``ambiguous`` or
``nonexistent`` to control how the timestamp should be re-localized.
>>> ts_tz = pd.Timestamp("2021-10-31 01:30:00").tz_localize("Europe/Amsterdam")
>>> ts_tz.round("H", ambiguous=False)
Timestamp('2021-10-31 02:00:00+0100', tz='Europe/Amsterdam')
>>> ts_tz.round("H", ambiguous=True)
Timestamp('2021-10-31 02:00:00+0200', tz='Europe/Amsterdam')
- strftime(self, format)
- Timestamp.strftime(format)
Return a string representing the given POSIX timestamp
controlled by an explicit format string.
Parameters
----------
format : str
Format string to convert Timestamp to string.
See strftime documentation for more information on the format string:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
>>> ts.strftime('%Y-%m-%d %X')
'2020-03-14 15:32:52'
- to_julian_date(self) -> numpy.float64
- Convert TimeStamp to a Julian Date.
0 Julian date is noon January 1, 4713 BC.
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52')
>>> ts.to_julian_date()
2458923.147824074
- tz_convert(self, tz)
- Convert timezone-aware Timestamp to another time zone.
Parameters
----------
tz : str, pytz.timezone, dateutil.tz.tzfile or None
Time zone for time which Timestamp will be converted to.
None will remove timezone holding UTC time.
Returns
-------
converted : Timestamp
Raises
------
TypeError
If Timestamp is tz-naive.
Examples
--------
Create a timestamp object with UTC timezone:
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651', tz='UTC')
>>> ts
Timestamp('2020-03-14 15:32:52.192548651+0000', tz='UTC')
Change to Tokyo timezone:
>>> ts.tz_convert(tz='Asia/Tokyo')
Timestamp('2020-03-15 00:32:52.192548651+0900', tz='Asia/Tokyo')
Can also use ``astimezone``:
>>> ts.astimezone(tz='Asia/Tokyo')
Timestamp('2020-03-15 00:32:52.192548651+0900', tz='Asia/Tokyo')
Analogous for ``pd.NaT``:
>>> pd.NaT.tz_convert(tz='Asia/Tokyo')
NaT
- tz_localize(self, tz, ambiguous='raise', nonexistent='raise')
- Convert naive Timestamp to local time zone, or remove
timezone from timezone-aware Timestamp.
Parameters
----------
tz : str, pytz.timezone, dateutil.tz.tzfile or None
Time zone for time which Timestamp will be converted to.
None will remove timezone holding local time.
ambiguous : bool, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
`ambiguous` parameter dictates how ambiguous times should be
handled.
The behavior is as follows:
* bool contains flags to determine if time is dst or not (note
that this flag is only applicable for ambiguous fall dst dates).
* 'NaT' will return NaT for an ambiguous time.
* 'raise' will raise an AmbiguousTimeError for an ambiguous time.
nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta, default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
The behavior is as follows:
* 'shift_forward' will shift the nonexistent time forward to the
closest existing time.
* 'shift_backward' will shift the nonexistent time backward to the
closest existing time.
* 'NaT' will return NaT where there are nonexistent times.
* timedelta objects will shift nonexistent times by the timedelta.
* 'raise' will raise an NonExistentTimeError if there are
nonexistent times.
Returns
-------
localized : Timestamp
Raises
------
TypeError
If the Timestamp is tz-aware and tz is not None.
Examples
--------
Create a naive timestamp object:
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
>>> ts
Timestamp('2020-03-14 15:32:52.192548651')
Add 'Europe/Stockholm' as timezone:
>>> ts.tz_localize(tz='Europe/Stockholm')
Timestamp('2020-03-14 15:32:52.192548651+0100', tz='Europe/Stockholm')
Analogous for ``pd.NaT``:
>>> pd.NaT.tz_localize()
NaT
- weekday(self)
- Return the day of the week represented by the date.
Monday == 0 ... Sunday == 6.
Class methods defined here:
- combine(date, time) from builtins.type
- Timestamp.combine(date, time)
Combine date, time into datetime with same date and time fields.
Examples
--------
>>> from datetime import date, time
>>> pd.Timestamp.combine(date(2020, 3, 14), time(15, 30, 15))
Timestamp('2020-03-14 15:30:15')
- fromordinal(ordinal, freq=None, tz=None) from builtins.type
- Timestamp.fromordinal(ordinal, freq=None, tz=None)
Passed an ordinal, translate and convert to a ts.
Note: by definition there cannot be any tz info on the ordinal itself.
Parameters
----------
ordinal : int
Date corresponding to a proleptic Gregorian ordinal.
freq : str, DateOffset
Offset to apply to the Timestamp.
tz : str, pytz.timezone, dateutil.tz.tzfile or None
Time zone for the Timestamp.
Examples
--------
>>> pd.Timestamp.fromordinal(737425)
Timestamp('2020-01-01 00:00:00')
- fromtimestamp(ts, tz=None) from builtins.type
- Timestamp.fromtimestamp(ts)
Transform timestamp[, tz] to tz's local time from POSIX timestamp.
Examples
--------
>>> pd.Timestamp.fromtimestamp(1584199972)
Timestamp('2020-03-14 15:32:52')
Note that the output may change depending on your local time.
- now(tz=None) from builtins.type
- Timestamp.now(tz=None)
Return new Timestamp object representing current time local to
tz.
Parameters
----------
tz : str or timezone object, default None
Timezone to localize to.
Examples
--------
>>> pd.Timestamp.now() # doctest: +SKIP
Timestamp('2020-11-16 22:06:16.378782')
Analogous for ``pd.NaT``:
>>> pd.NaT.now()
NaT
- strptime(date_string, format) from builtins.type
- Timestamp.strptime(string, format)
Function is not implemented. Use pd.to_datetime().
- today(tz=None) from builtins.type
- Timestamp.today(cls, tz=None)
Return the current time in the local timezone. This differs
from datetime.today() in that it can be localized to a
passed timezone.
Parameters
----------
tz : str or timezone object, default None
Timezone to localize to.
Examples
--------
>>> pd.Timestamp.today() # doctest: +SKIP
Timestamp('2020-11-16 22:37:39.969883')
Analogous for ``pd.NaT``:
>>> pd.NaT.today()
NaT
- utcfromtimestamp(ts) from builtins.type
- Timestamp.utcfromtimestamp(ts)
Construct a naive UTC datetime from a POSIX timestamp.
Examples
--------
>>> pd.Timestamp.utcfromtimestamp(1584199972)
Timestamp('2020-03-14 15:32:52')
- utcnow() from builtins.type
- Timestamp.utcnow()
Return a new Timestamp representing UTC day and time.
Examples
--------
>>> pd.Timestamp.utcnow() # doctest: +SKIP
Timestamp('2020-11-16 22:50:18.092888+0000', tz='UTC')
Readonly properties defined here:
- freqstr
- Return the total number of days in the month.
Data descriptors defined here:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
- daysinmonth
- Return the number of days in the month.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.days_in_month
31
- tz
- Alias for tzinfo.
Examples
--------
>>> ts = pd.Timestamp(1584226800, unit='s', tz='Europe/Stockholm')
>>> ts.tz
<DstTzInfo 'Europe/Stockholm' CET+1:00:00 STD>
- weekofyear
- Return the week number of the year.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.week
11
Data and other attributes defined here:
- max = Timestamp('2262-04-11 23:47:16.854775807')
- min = Timestamp('1677-09-21 00:12:43.145224193')
- resolution = Timedelta('0 days 00:00:00.000000001')
Methods inherited from _Timestamp:
- __add__(self, value, /)
- Return self+value.
- __eq__(self, value, /)
- Return self==value.
- __ge__(self, value, /)
- Return self>=value.
- __gt__(self, value, /)
- Return self>value.
- __hash__(self, /)
- Return hash(self).
- __le__(self, value, /)
- Return self<=value.
- __lt__(self, value, /)
- Return self<value.
- __ne__(self, value, /)
- Return self!=value.
- __radd__(self, value, /)
- Return value+self.
- __reduce__(...)
- __reduce__() -> (cls, state)
- __reduce_ex__(...)
- __reduce_ex__(proto) -> (cls, state)
- __repr__(self, /)
- Return repr(self).
- __rsub__(self, value, /)
- Return value-self.
- __setstate__(...)
- __sub__(self, value, /)
- Return self-value.
- day_name(...)
- Return the day name of the Timestamp with specified locale.
Parameters
----------
locale : str, default None (English locale)
Locale determining the language in which to return the day name.
Returns
-------
str
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
>>> ts.day_name()
'Saturday'
Analogous for ``pd.NaT``:
>>> pd.NaT.day_name()
nan
- isoformat(...)
- Return the time formatted according to ISO 8610.
The full format looks like 'YYYY-MM-DD HH:MM:SS.mmmmmmnnn'.
By default, the fractional part is omitted if self.microsecond == 0
and self.nanosecond == 0.
If self.tzinfo is not None, the UTC offset is also attached, giving
giving a full format of 'YYYY-MM-DD HH:MM:SS.mmmmmmnnn+HH:MM'.
Parameters
----------
sep : str, default 'T'
String used as the separator between the date and time.
timespec : str, default 'auto'
Specifies the number of additional terms of the time to include.
The valid values are 'auto', 'hours', 'minutes', 'seconds',
'milliseconds', 'microseconds', and 'nanoseconds'.
Returns
-------
str
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
>>> ts.isoformat()
'2020-03-14T15:32:52.192548651'
>>> ts.isoformat(timespec='microseconds')
'2020-03-14T15:32:52.192548'
- month_name(...)
- Return the month name of the Timestamp with specified locale.
Parameters
----------
locale : str, default None (English locale)
Locale determining the language in which to return the month name.
Returns
-------
str
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
>>> ts.month_name()
'March'
Analogous for ``pd.NaT``:
>>> pd.NaT.month_name()
nan
- normalize(...)
- Normalize Timestamp to midnight, preserving tz information.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14, 15, 30)
>>> ts.normalize()
Timestamp('2020-03-14 00:00:00')
- timestamp(...)
- Return POSIX timestamp as float.
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548')
>>> ts.timestamp()
1584199972.192548
- to_datetime64(...)
- Return a numpy.datetime64 object with 'ns' precision.
- to_numpy(...)
- Convert the Timestamp to a NumPy datetime64.
.. versionadded:: 0.25.0
This is an alias method for `Timestamp.to_datetime64()`. The dtype and
copy parameters are available here only for compatibility. Their values
will not affect the return value.
Returns
-------
numpy.datetime64
See Also
--------
DatetimeIndex.to_numpy : Similar method for DatetimeIndex.
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
>>> ts.to_numpy()
numpy.datetime64('2020-03-14T15:32:52.192548651')
Analogous for ``pd.NaT``:
>>> pd.NaT.to_numpy()
numpy.datetime64('NaT')
- to_period(...)
- Return an period of which this timestamp is an observation.
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548651')
>>> # Year end frequency
>>> ts.to_period(freq='Y')
Period('2020', 'A-DEC')
>>> # Month end frequency
>>> ts.to_period(freq='M')
Period('2020-03', 'M')
>>> # Weekly frequency
>>> ts.to_period(freq='W')
Period('2020-03-09/2020-03-15', 'W-SUN')
>>> # Quarter end frequency
>>> ts.to_period(freq='Q')
Period('2020Q1', 'Q-DEC')
- to_pydatetime(...)
- Convert a Timestamp object to a native Python datetime object.
If warn=True, issue a warning if nanoseconds is nonzero.
Examples
--------
>>> ts = pd.Timestamp('2020-03-14T15:32:52.192548')
>>> ts.to_pydatetime()
datetime.datetime(2020, 3, 14, 15, 32, 52, 192548)
Analogous for ``pd.NaT``:
>>> pd.NaT.to_pydatetime()
NaT
Data descriptors inherited from _Timestamp:
- asm8
- Return numpy datetime64 format in nanoseconds.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14, 15)
>>> ts.asm8
numpy.datetime64('2020-03-14T15:00:00.000000000')
- day_of_week
- Return day of the week.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_week
5
- day_of_year
- Return the day of the year.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_year
74
- dayofweek
- Return day of the week.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_week
5
- dayofyear
- Return the day of the year.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.day_of_year
74
- days_in_month
- Return the number of days in the month.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.days_in_month
31
- freq
- is_leap_year
- Return True if year is a leap year.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_leap_year
True
- is_month_end
- Return True if date is last day of month.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_month_end
False
>>> ts = pd.Timestamp(2020, 12, 31)
>>> ts.is_month_end
True
- is_month_start
- Return True if date is first day of month.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_month_start
False
>>> ts = pd.Timestamp(2020, 1, 1)
>>> ts.is_month_start
True
- is_quarter_end
- Return True if date is last day of the quarter.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_quarter_end
False
>>> ts = pd.Timestamp(2020, 3, 31)
>>> ts.is_quarter_end
True
- is_quarter_start
- Return True if date is first day of the quarter.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_quarter_start
False
>>> ts = pd.Timestamp(2020, 4, 1)
>>> ts.is_quarter_start
True
- is_year_end
- Return True if date is last day of the year.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_year_end
False
>>> ts = pd.Timestamp(2020, 12, 31)
>>> ts.is_year_end
True
- is_year_start
- Return True if date is first day of the year.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.is_year_start
False
>>> ts = pd.Timestamp(2020, 1, 1)
>>> ts.is_year_start
True
- nanosecond
- quarter
- Return the quarter of the year.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.quarter
1
- value
- week
- Return the week number of the year.
Examples
--------
>>> ts = pd.Timestamp(2020, 3, 14)
>>> ts.week
11
Data and other attributes inherited from _Timestamp:
- __array_priority__ = 100
- __pyx_vtable__ = <capsule object NULL>
Methods inherited from pandas._libs.tslibs.base.ABCTimestamp:
- __reduce_cython__(...)
- __setstate_cython__(...)
Methods inherited from datetime.datetime:
- __getattribute__(self, name, /)
- Return getattr(self, name).
- __str__(self, /)
- Return str(self).
- ctime(...)
- Return ctime() style string.
- date(...)
- Return date object with same year, month and day.
- dst(...)
- Return self.tzinfo.dst(self).
- time(...)
- Return time object with same time but with tzinfo=None.
- timetuple(...)
- Return time tuple, compatible with time.localtime().
- timetz(...)
- Return time object with same time and tzinfo.
- tzname(...)
- Return self.tzinfo.tzname(self).
- utcoffset(...)
- Return self.tzinfo.utcoffset(self).
- utctimetuple(...)
- Return UTC time tuple, compatible with time.localtime().
Class methods inherited from datetime.datetime:
- fromisoformat(...) from builtins.type
- string -> datetime from datetime.isoformat() output
Data descriptors inherited from datetime.datetime:
- fold
- hour
- microsecond
- minute
- second
- tzinfo
Methods inherited from datetime.date:
- __format__(...)
- Formats self with strftime.
- isocalendar(...)
- Return a 3-tuple containing ISO year, week number, and weekday.
- toordinal(...)
- Return proleptic Gregorian ordinal. January 1 of year 1 is day 1.
Class methods inherited from datetime.date:
- fromisocalendar(...) from builtins.type
- int, int, int -> Construct a date from the ISO year, week number and weekday.
This is the inverse of the date.isocalendar() function
Data descriptors inherited from datetime.date:
- day
- month
- year
|
class UInt16Dtype(_IntegerDtype) |
| |
An ExtensionDtype for uint16 integer data.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as its missing value,
rather than :attr:`numpy.nan`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- UInt16Dtype
- _IntegerDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'UInt16'
- type = <class 'numpy.uint16'>
- Unsigned integer type, compatible with C ``unsigned short``.
:Character code: ``'H'``
:Canonical name: `numpy.ushort`
:Alias on this platform (Windows AMD64): `numpy.uint16`: 16-bit unsigned integer (``0`` to ``65_535``).
Methods inherited from _IntegerDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from _IntegerDtype:
- construct_array_type() -> 'type[IntegerArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Data descriptors inherited from _IntegerDtype:
- is_signed_integer
- is_unsigned_integer
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class UInt32Dtype(_IntegerDtype) |
| |
An ExtensionDtype for uint32 integer data.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as its missing value,
rather than :attr:`numpy.nan`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- UInt32Dtype
- _IntegerDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'UInt32'
- type = <class 'numpy.uint32'>
- Unsigned integer type, compatible with C ``unsigned long``.
:Character code: ``'L'``
:Canonical name: `numpy.uint`
:Alias on this platform (Windows AMD64): `numpy.uint32`: 32-bit unsigned integer (``0`` to ``4_294_967_295``).
Methods inherited from _IntegerDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from _IntegerDtype:
- construct_array_type() -> 'type[IntegerArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Data descriptors inherited from _IntegerDtype:
- is_signed_integer
- is_unsigned_integer
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class UInt64Dtype(_IntegerDtype) |
| |
An ExtensionDtype for uint64 integer data.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as its missing value,
rather than :attr:`numpy.nan`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- UInt64Dtype
- _IntegerDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'UInt64'
- type = <class 'numpy.uint64'>
- Signed integer type, compatible with C ``unsigned long long``.
:Character code: ``'Q'``
:Canonical name: `numpy.ulonglong`
:Alias on this platform (Windows AMD64): `numpy.uint64`: 64-bit unsigned integer (``0`` to ``18_446_744_073_709_551_615``).
:Alias on this platform (Windows AMD64): `numpy.uintp`: Unsigned integer large enough to fit pointer, compatible with C ``uintptr_t``.
Methods inherited from _IntegerDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from _IntegerDtype:
- construct_array_type() -> 'type[IntegerArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Data descriptors inherited from _IntegerDtype:
- is_signed_integer
- is_unsigned_integer
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class UInt8Dtype(_IntegerDtype) |
| |
An ExtensionDtype for uint8 integer data.
.. versionchanged:: 1.0.0
Now uses :attr:`pandas.NA` as its missing value,
rather than :attr:`numpy.nan`.
Attributes
----------
None
Methods
-------
None |
| |
- Method resolution order:
- UInt8Dtype
- _IntegerDtype
- pandas.core.arrays.numeric.NumericDtype
- pandas.core.arrays.masked.BaseMaskedDtype
- pandas.core.dtypes.base.ExtensionDtype
- builtins.object
Data and other attributes defined here:
- name = 'UInt8'
- type = <class 'numpy.uint8'>
- Unsigned integer type, compatible with C ``unsigned char``.
:Character code: ``'B'``
:Canonical name: `numpy.ubyte`
:Alias on this platform (Windows AMD64): `numpy.uint8`: 8-bit unsigned integer (``0`` to ``255``).
Methods inherited from _IntegerDtype:
- __repr__(self) -> 'str'
- Return repr(self).
Class methods inherited from _IntegerDtype:
- construct_array_type() -> 'type[IntegerArray]' from builtins.type
- Return the array type associated with this dtype.
Returns
-------
type
Data descriptors inherited from _IntegerDtype:
- is_signed_integer
- is_unsigned_integer
Methods inherited from pandas.core.arrays.numeric.NumericDtype:
- __from_arrow__(self, array: 'pyarrow.Array | pyarrow.ChunkedArray') -> 'BaseMaskedArray'
- Construct IntegerArray/FloatingArray from pyarrow Array/ChunkedArray.
Data descriptors inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- itemsize
- Return the number of bytes in this dtype
- kind
- numpy_dtype
- Return an instance of our numpy dtype
Data and other attributes inherited from pandas.core.arrays.masked.BaseMaskedDtype:
- __annotations__ = {'name': 'str', 'type': 'type'}
- base = None
- na_value = <NA>
Methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- __eq__(self, other: 'Any') -> 'bool'
- Check whether 'other' is equal to self.
By default, 'other' is considered equal if either
* it's a string matching 'self.name'.
* it's an instance of this type and all of the attributes
in ``self._metadata`` are equal between `self` and `other`.
Parameters
----------
other : Any
Returns
-------
bool
- __hash__(self) -> 'int'
- Return hash(self).
- __ne__(self, other: 'Any') -> 'bool'
- Return self!=value.
- __str__(self) -> 'str'
- Return str(self).
- empty(self, shape: 'Shape') -> 'type_t[ExtensionArray]'
- Construct an ExtensionArray of this dtype with the given shape.
Analogous to numpy.empty.
Parameters
----------
shape : int or tuple[int]
Returns
-------
ExtensionArray
Class methods inherited from pandas.core.dtypes.base.ExtensionDtype:
- construct_from_string(string: 'str') -> 'ExtensionDtypeT' from builtins.type
- Construct this type from a string.
This is useful mainly for data types that accept parameters.
For example, a period dtype accepts a frequency parameter that
can be set as ``period[H]`` (where H means hourly frequency).
By default, in the abstract class, just the name of the type is
expected. But subclasses can overwrite this method to accept
parameters.
Parameters
----------
string : str
The name of the type, for example ``category``.
Returns
-------
ExtensionDtype
Instance of the dtype.
Raises
------
TypeError
If a class cannot be constructed from this 'string'.
Examples
--------
For extension dtypes with arguments the following may be an
adequate implementation.
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
... match = pattern.match(string)
... if match:
... return cls(**match.groupdict())
... else:
... raise TypeError(
... f"Cannot construct a '{cls.__name__}' from '{string}'"
... )
- is_dtype(dtype: 'object') -> 'bool' from builtins.type
- Check if we match 'dtype'.
Parameters
----------
dtype : object
The object to check.
Returns
-------
bool
Notes
-----
The default implementation is True if
1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
Readonly properties inherited from pandas.core.dtypes.base.ExtensionDtype:
- names
- Ordered list of field names, or None if there are no fields.
This is for compatibility with NumPy arrays, and may be removed in the
future.
Data descriptors inherited from pandas.core.dtypes.base.ExtensionDtype:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|
class option_context(contextlib.ContextDecorator) |
| |
option_context(*args)
Context manager to temporarily set options in the `with` statement context.
You need to invoke as ``option_context(pat, val, [(pat, val), ...])``.
Examples
--------
>>> with option_context('display.max_rows', 10, 'display.max_columns', 5):
... pass |
| |
- Method resolution order:
- option_context
- contextlib.ContextDecorator
- builtins.object
Methods defined here:
- __enter__(self)
- __exit__(self, *args)
- __init__(self, *args)
- Initialize self. See help(type(self)) for accurate signature.
Methods inherited from contextlib.ContextDecorator:
- __call__(self, func)
- Call self as a function.
Data descriptors inherited from contextlib.ContextDecorator:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |